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Preface 



The scale-space conference series dates back to the NSF/ESPRIT transatlantic 
collaboration on “geometry-driven diffusion” (1993-1996). This collaboration led 
to a series of very successful workshops followed by a PhD summer school on 
Gaussian Scale-Space Theory in Copenhagen, in spring 1996. After this, Bart 
ter Haar Romeny arranged the First International Conference on Scale-Space 
Theory in Computer Vision (Utrecht, July 2-4, 1997). Indeed the title was ap- 
propriate since this was the first scale-space conference in a series of so far two 
conferences. We feel very confident that the series will be much longer. We hope 
that the scheduling next to ICCV ’99 will attract more delegates furthering the 
integration of scale-space theories into computer vision. 

Since the first scale-space conference we have had an increase of more than 
50% in the number of contributions. Of 66 high-quality submissions, we could, 
due to the time limitation of the conference, only select 24 papers for oral presen- 
tations. They form Part I of this volume. Many papers were of such high quality, 
that they would otherwise have qualified for oral presentation. It was decided 
to include 12 of the remaining papers in full length in the proceedings, creating 
the category of “Long Posters”. They form Part 2 of this volume. Finally, 18 
papers were accepted for poster presentations, constituting Part 3. Invited talks 
were given by Prof. Rudiger von der Heydt, Department of Neuroscience, Johns 
Hopkins University School of Medicine and Prof. David L. Donoho, Statistics 
Department, Stanford University. 

We would like to thank everyone who contributed to the success of this 
2nd conference on scale-space theories in computer vision; first of all the many 
authors for their excellent and timely contributions, the referees that in a very 
short period reviewed the many papers (each paper was reviewed by 3 referees) , 
members of the conference board, program board, and program committee. Ole 
F. Olsen and Erik B. Dam for their administration and the work of collecting the 
papers for this volume, ICCV and John Tsotsos for the very flexible hosting of 
the conference, and, last but not least, all who otherwise participated in making 
the conference successful. 
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Blur and Disorder 



Jan J. Koenderink and Andrea J. van Doom 



Department of Physics and Astronomy, 
PO Box 80 000, 3508TA Utrecht, 
The Netherlands 



Abstract. Blurring is not the only way to selectively remove fine spa- 
tial detail from an image. An alternative is to scramble pixels locally 
over areas defined by the desired blur circle. We refer to such scram- 
bled images as “locally disorderly”. Such images have many potentially 
interesting applications. In this contribution we discuss a formal frame- 
work for such locally disorderly images. It boils down to a number of 
intricately intertwined scale spaces, one of which is the ordinary linear 
scale space for the image. The formalism is constructed on the basis of 
an operational definition of local histograms of arbitrary bin width and 
arbitrary support. 



1 Introduction 

Standing in front of a tree you may believe that you clearly see the leaves. 
You also see “foliage” (which is a kind of texture), the general shape of the 
treetop, and so forth[3,10]. You will be hard put to keep a given leaf in mind 
(say) and retrieve it the next day, or even a minute later. You fail any idea 
of the total number of leaves. If you glance away and we would remove a few 
leaves you would hardly notice. Indeed, it seems likely that you wouldn’t notice 
quite serious changes in the foliage at all: One branch tends to look much like 
any other. This is what John Ruskin[13,14] called “mystery”. Yet you would 
notice when the treetop were replaced with a balloon of the same shape and 
identical average color. The foliage texture is important (it makes the foliage 
look like what it is) even though the actual spatial structure is largely ineffective 
in human vision. 

Something quite similar occurs with images. Given an image we may scram- 
ble its pixels or replace all pixels with the average taken over all pixels. Both 
operations clearly destroy all spatial information. Yet the scrambled picture con- 
tains more information than the blurred one. When the blurring or scrambling 
is done locally at many places in a large image, one obtains something like a 
blurred photograph in one case, like a “painterly rendering” in the other case. 
Although the painterly rendering does not reveal more spatial detail than the 
blurred one, it tells you more about the rendered scene. We call such images 
“locally disorderly” . We became interested because human vision is locally dis- 
ordered in the peripheral visual field. In pathological cases it may even dominate 
focal vision[4,6]. For instance, in cases of (always unilateral) “scrambled vision” 
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the visual field is useless for reading newspapers (even the headlines) say. Yet 
such patients can discriminate even the highest spatial frequencies from a uni- 
form field: In that respect the scrambled eye is at par with the good one. There is 
nothing wrong with the optics of such eyes, nor with the sampling. The patients 
simply don’t know where the pixels are, it is a mental deficiency, not an optical 
or neurophysiological disorder. 

Apart from the natural urge to understand such locally disorderly images 
formally, they may have many applications in computer graphics (why compute 
all leaves on a tree when a painter can do without?), image compression (why 
encode what the human observer doesn’t notice?), and so forth. However, our 
interest is mainly fundamental. 



2 Local histograms 

For an image showing an extensive scene the histogram of the full image makes 
little sense because it is a hodge-podge of mutually unrelated entities. Far more 
informative are histograms of selected regions of interest that show certain uni- 
form regions (where the precise concept of “uniformity” may vary) . Such a region 
may be part of a treetop for instance. A pixel histogram still might show quite 
different structure according to the size of the leaf images with respect to the 
pixel size. In order to understand the structure of the histogram one has to spec- 
ify both the region of interest and the spatial resolution. Finally, the result will 
depend upon the resolution in the intensity domain: The histogram for a binary 
image, an eight bit or a thirty two bit image will all look different. We will refer 
to this as the bin width. Thus one needs at least the following parameters to 
specify a local histogram: The location and size of the region of interest (the 
support), the spatial resolution, and the bin width (that is the resolution in the 
intensity domain). We will regard the histogram as a function of position, thus 
one obtains a histogram valued image. 

In order to construct histogram valued images one has to measure how much 
“stuff” should go in a certain bin for a certain region of interest, at a certain res- 
olution. The spatial distribution of this “stuff” is what makes up the histogram 
valued image for a certain region size, resolution, bin width and intensity value. 
For each intensity value (keeping the other parameters constant) one has such a 
stuff distribution image. 

Here is how one might construct a stuff distribution image: First one con- 
structs the image at the specified level of resolution. Formally[7,9,l] this can be 
written as a convolution (here “0” denotes the convolution operator) 

/(r;cr) = S'(r)(g)Go(r;cr) (1) 

of the “scene” (<S'(r); at infinite resolution!) with a Gaussian kernel 



Go(r; cr) 



27r(T^ ’ 



(2) 
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where cr denotes the resolution. Of course this is to be understood as merely 
formal: In reality one observes the image, the scene is implicit it cannot be 
observed other than via images. It is like the horizon that can never be reached. 

Next one applies the following nonlinear transformation to the intensity of 
each pixel: 

F{i;io,P) = e (3) 

here io denotes the fiducial intensity value and f3 denotes the bin width. Notice 
that the resulting image has pixel values in the range zero to one. These values 
can be regarded as the value of the membership function of a given pixel to the 
specified bin. Finally one blurs the result, taking the size of the region of interest 
as internal scale. The kernel to be used is 



A{r;ro,a) = e 



( 4 ) 



The function A is the “aperture function” which defines the (soft) region of in- 
terest. It is not normalized, thus the total amount of stuff that gets into a bin will 
be proportional with the area of the aperture. This distributes the membership 
over the region of interest. Thus the local histograms are defined as 

1 f (I(r„T)-i)^ 

H(i;ro,a,/3,a) = - — ^ A{v,VQ,a)e 2/32 ( 5 ) 

ZTTU; J image 

Here we have normalized with respect to the area of the region of interest. The 
resulting image will have an overall level zero with a certain diffuse ribbon (or 
curve like) object that may reveal values up to one. This ribbon is intuitively a 
representation of a “soft isophote” . It is in many respects a more useful object 
than a true “isophote” . For instance, in regions where the intensity is roughly 
constant the “true” isophotes are ill defined and often take on a fractal character. 
The smallest perturbation may change their nature completely. In such regions 
the ribbon spreads out and fills the roughly uniform region with a nonzero but 
low value: The membership is divided over the pixels in the uniform region. 
For a linear gradient, the steeper the gradient, the narrower the isophote. (See 
figures 1, 2 and 3.) 

Notice that we deal with a number of closely intertwined scale spaces here. 
First there is the regular scale space of the image at several levels of resolution. 
Here the extentional parameter is r and the scale parameter a. Then there is the 
scale space of histograms. Here the “extentional” parameter is the intensity i 
whereas the “resolution” parameter is the bin width. It is indeed (by construc- 
tion) a neat linear scale space. One may think of it as the histogram at maximum 
resolution, blurred with a Gaussian kernel at the bin width. Thus each pixel in 
the image may be thought of as contributing a delta pulse at its intensity, these 
are then blurred to obtain the histogram: This is formally identical to the Parzen 
estimator [12] of the histogram. Finally, there is the scale space of stuff images 
with the size of the region of interest as inner scale. The extentional parameter 
is To and the scale parameter a. 
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Fig. 1. An image and three distributions of “histogram stuff”. The bin width is 8, 
fiducial levels are 107, 127 and 147 (intensity ranges from 0 to 255). Notice the shift 
of the isophote with level. In this respect the soft isophotes behave exactly like regular 
isophotes (figure 3). 




Fig. 2. Four distributions of “histogram stuff”. The image is shown in figure 1 on 
the left. Fiducial level is 127, bin widths are 8, 16, 32 and 64. Notice that though 
the location of the soft isophote is constant, its width increases as the bin width is 
increased. The soft isophotes can be quite broad “ribbons”, in fact there is no limit, 
they may fill the whole image. 




Fig. 3. Three regular isophotes of the image shown in figure 1 on the left. Notice that 
the isophotes are not smooth curves but have a fractal character. Compare with the 
soft isophotes in figure 2. 
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3 Locally disorderly images 

Consider an image and its histogram (in the classical sense, no scale space). 
When we do arbitrary permutations on the pixels we obtain images that look 
quite distinct (the spatial structure can differ in arbitrary ways), yet they all 
have the same histogram. Thus the histogram can be regarded as the image 
modulo its spatial structure. That it why we refer to such images as “disor- 
derly” in the first place. Histograms are zeroth order summary descriptions of 
images, the conceptually simplest type of “texture” . A histogram valued image 
is likewise to be considered “locally disorderly” . Notice that a disorderly image is 
not completely lacking in spatial discrimination. For instance, consider an image 
that consists of vertical stripes at the maximum frequency, say pixels with even 
horizontal index white, with odd index black. When this image is scrambled it 
can still be discriminated from a uniform image that is uniformly fifty percent 
gray. On the other hand, this discrimination fails when we completely blur the 
image, that is say, replace every pixel value with the average over all pixels. Yet 
the rudimentary spatial discrimination is not perfect, for instance one cannot 
discriminate between vertical and horizontal stripes. 

Quite apart from spatial discrimination, locally disorderly images retain much 
information that exists at a scale that is not resolved and would be lost in a to- 
tally blurred image[ll,2]. The type of information that is retained often lets 
you recognize material properties that would be lost by total blurring. Locally 
disorderly images are very similar to “painterly” renderings. In an impressionist 
painting there is often no spatial information on the scale of the brush strokes 
(touches). At this scale one has structure that reveals the facture, often a signa- 
ture of the artist. Yet the strokes contribute much to the realistic rendering. The 
artist doesn’t paint leaves when he/she paints a treetop, he/she “does foliage”. 
Although no leaves are ever painted you can often recognize the genus by the 
way the foliage is handled. Pointillists were quite explicit about such things. For 
instance, Seurat[5] lists the colors of touches that go in a rendering of grass. He 
never paints leaves of grass, yet the grass is not uniformly green: There are blues 
(shadow areas), yellows (translucency) and oranges (glints of sunlight). These 
touches appear in a locally disorderly pattern. 

Locally disorderly images can be rendered (see figure 4) by using the his- 
tograms as densities for a random generator. Every instance will be different, 
their totality is the locally disorderly image. Although all instances are different, 
they look the same. Indeed, they are the same on the scale where disorder gives 
way to order, and on finer scales the histograms are identical. Such renderings 
are not unlike “dithered” representations[15, 16]. 

4 Generalizations 

In this paper we have considered histograms of pixel values. It is of course an 
obvious generalization to include local histograms of derivatives or (better) dif- 
ferential invariants. Then one essentially considers local histograms of the local 
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Fig. 4. Left: Original image; Middle: Blurred image; Left: Locally disorderly image. 
The blur and local disorder destroy roughly the same spatial detail. Notice that the 
locally disordered image looks “better” than the blurred one and indeed retains more 
information, for instance the bright pixels of the specularity on the nose. 



jets up to some order [8], or perhaps the jets modulo some transformation, say 
a rotation or a left/right symmetry. Such higher order locally disorderly images 
specify the distribution of textural qualities. Such structures have indeed be 
used (albeit in an ad hoc fashion) to create instances of textures given a single 
prototypical example. The local disorder is to be considered “zeroth order tex- 
ture”: The local image structure is summarized via the intensity values (zeroth 
order) alone. Adding gradient information (histograms of the first order “edge 
detectors”) would yield “first order texture”, and so forth. 

The formal description of such structures appears straightforward and would 
indeed yield a very attractive and (most likely) useful description of image struc- 
ture. 

5 Disorderly images and segmentation 

When the boundary between two uniform regions is blurred the pixels appear 
to lose their origin. For instance, blurring the boundary between a uniformly 
red and a uniformly green field yields a ribbon of yellow pixels that appear to 
belong to neither side. This has been a major incentive for the construction of 
nonlinear blurring methods: The idea was to blur areas, but not over edges. With 
locally disorderly images this dilemma is automatically solved in an unexpected 
way. Although the boundary indeed becomes spatially less and less defined, 
the individual pixels hold on to their origin. In the example given above no 
yellow pixels are ever generated. One doesn’t know where the red area ends 
and the green begins because there are red pixels to be found in the green area 
and vice versa. But the pixels themselves don’t change their hue. In terms of 
segmentation the red and green areas may be said to overlap in the boundary 
area. 

In figures 5 and 6 we show a segmentation of a face on the basis of major 
modes in the local histograms. The light and dark halves of the face are seg- 





Fig. 6. Segments belonging to the major modes of the histogram shown in figure 5. 
These are the illuminated and the shadow side of the face. The segments overlap over 
the ridge of the nose. 
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merited. There is some minor overlap on the bridge of the nose. In figure 7 we 
show a more extreme case: Here the segments belonging to two major modes 
are extensively overlapping. This example illustrates that the locally disorderly 
images can be thought of as entertaining multiple intensity levels at any given 
location in the image[ll,2]. For instance, one may take the major modes in the 
local histograms as intensity values. When there exists only a single major mode 
the representation is much like the conventional one (one intensity per pixel). 
But if there are two or more major modes the advantage of the locally disor- 
derly representation becomes clear. In figure 7 the person apparently wears a 
dress that is light and dark at the same time! Of course this occurs only when 
the stripes of the dress have been lost in the disorder. In such cases the dif- 
ference between the blurred and locally scrambled versions of an image are of 
course dramatic (figure 8). 




Fig. 7. An example of “transparent segments”. On the left the original image. In 
the middle the segment belonging to the light stripes of the dress, on the right that 
belonging to the dark stripes. Note that in the latter case the dark hair is included in 
the segment, in the former case the light interstice between the arm and the leg. In 
the body part the dress is (at this resolution) “simultaneously light and dark”: In the 
disorderly representation the locations of the stripes are lost. 




Fig. 8. Blurred and locally scrambled versions of the striped dress image. 
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6 Conclusion 

Locally disorderly images admit of a neat formal description in terms of a num- 
ber of intimately interlocked scale spaces. They can easily be rendered and can 
thus be used in image compression. They have distinct advantages over blurred 
images when one is interested in image segmentation. This is often desirable, for 
instance, one may treat the blue sky as continuous even where it is partly oc- 
cluded by numerous twigs when seen through a tree top. Thus locally disorderly 
images may well find applications in image processing and interpretation. 
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Abstract. In a recent work [1], Koenderink and van Doom consider 
a family of three intertwined scale-spaces coined the locally orderless 
image (LOI). The LOI represents the image, observed at inner scale a, 
as a local histogram with bin-width /3, at each location, with a Gaussian- 
shaped region of interest of extent a. LOIs form a natural and elegant 
extension of scale-space theory, show causal consistency and enable the 
smooth transition between pixels, histograms and isophotes. The aim of 
this work is to demonstrate the wide applicability and versatility of LOIs. 
We consider a range of image processing tasks, including variations of 
adaptive histogram equalization, several methods for noise and scratch 
removal, texture rendering, classification and segmentation. 



1 Introduction 

Histograms are ubiquitous in image processing. They embody the notion that 
for many tasks, it is not the spatial order but the intensity distribution within 
a region of interest that contains the required information. One can argue that 
even at a single location the intensity has an uncertainty, and should therefore 
be described by a probability distribution: physical plausibility requires non-zero 
imprecision. This led Griffin [2] to propose a scale-imprecision space with spatial 
scale parameter a and an intensity, or tonal scale /3, which can be identified with 
the familiar hin-width of histograms. 

Koenderink and Van Doom [1] extended this concept to locally orderless im- 
ages (LOIs), an image representation with three scale parameters in which there 
is no local but only a global topology defined. LOIs are local histograms, con- 
structed according to scale-space principles, viz. without violating the causality 
principle. As such, one can apply to LOIs the whole machinery of techniques 
that has been developed in the context of scale-space research. 

In this paper, we aim to demonstrate that LOIs are a versatile and flexi- 
ble framework for image processing applications. The reader may conceive this 
article as a broad feasibility study. Due to space limitations, we cannot give thor- 
ough evaluations for each application presented. Obviously, local histograms are 
in common use, and the notion to consider histograms at different scales (soft 
binning) isn’t new either. Yet we believe that the use of a consistent mathe- 
matical framework in which all scale parameters are made explicit can aid the 
design of effective algorithms by reusing existing scale-space concepts. Additional 
insight may be gained by taking into account the behavior of LOIs over scale. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 10—21, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 
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2 Locally orderless images 



We first briefly review locally orderless images [1] by considering the scale pa- 
rameters involved in the calculation of a histogram: 



— the inner scale cr with which the image is observed; 

— the outer scale, or extent, or scope a that parameterizes the size of the held 
of view over which the histogram is calculated; 

— the scale at which the histogram is observed, tonal scale, or bin-width /?. 

The locally orderless images H{xo,i;a,a, /3) are defined as the family of 
histograms, i.e. a function of the intensity i, with bin-width f3 of the image 
observed at scale a calculated over a held of view centered around xq with extent 
a. The unique way to decrease resolution without creating spurious resolution 
is by convolution with Gaussian kernels [3] [4]. Therefore Gaussian kernels are 
used for cr, a and l3. We summarize this with a recipe for calculating LOIs: 



1. Ghoose an inner scale a and blur the image L{x] a) using the diffusion 

dL(x; a) 



A(^)L{x; cr) = 



d— 
^ 2 



( 1 ) 



2. Ghoose a number of (equally spaced) bins of intensity levels i and calculate 
the “soft isophote images” , representing the “stuff” in each bin through the 
Gaussian gray-scale transformation 



n/ ■ m / (^(x;cr)-l)2 

R[x, i; cr, f3) = exp( ^ ) 



(2) 



3. Ghoose a scope a for a Gaussian aperture, normalized to unit amplitude 



At ^ -(x-Xo)(x-Xo) , , 

A(x;xo,a) = exp — (3) 

and compute the locally orderless image through convolution 

H{xQ,i](7,a,f3) = * R{x,i; cr, (3). (4) 

Note that H{xq, i; a, [3, a), is a stack of isophote images, and therefore has a 
dimensionality 1 higher than that of the input image. 

The term locally orderless image refers to the fact that we have at each 
location the probability distribution at our disposal, which is a mere orderless 
set; the spatial structure within the held of view a centered at x has been 
obliterated. This is the key point: instead of a (scalar) intensity, we associate 
a probability distribution with each spatial location, parameterized by a,a,f3. 
Since a distribution contains more information then the intensity alone, we may 
hope to be able to use this information in various image processing tasks. 

The LOI contains several conventional concepts. The original image and its 
scale-space T(x; a) that can be recovered by integrating iH(xo, z; a, a, (3) over i. 
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The “conventional” histogram is obtained by letting a — >■ oo. The construction 
also includes families of isophote images, which for /3 > 0 are named soft isophote 
images by Koenderink. And maybe even more important, by tuning the scale 
parameters the LOI can fill intermediate stages between the image, its histogram 
and its isophotes. This can be useful in practice. The framework generalizes 
trivially to nD images or color images, if a color metric is selected. 



3 Median and maximum mode evolution 

If we replace the histogram at each location with its mean, we obtain the input 
image T(x; cr) blurred with a kernel with width a. This holds independently of 
(3, since blurring a histogram does not alter its mean. If, however, we replace 
the histogram with its median or its maximum mode (intensity with highest 
probability) , we obtain a diffusion with scale parameter a that is reminiscent of 
some non-linear diffusion schemes. The tonal scale j3 works as a tuning parameter 
that determines the amount of non-linearity. For f3 — >■ oo, the median and the 
maximum mode are equal to the mean, so the diffusion is linear. Griffin [2] has 
studied the evolution of the median, and the stable mode (defined as the mode 
surviving as /3 increases), which is usually equal to the maximum mode. He 
always sets a = a. This ensures that for a — >■ oo the image attains its mean 
everywhere, as in linear diffusion. With only a few soft isophote level images in 
the LOI, maximum mode diffusion also performs some sort of quantizing, and one 
obtains piecewise homogenous patches with user-selectable values. This can be 
useful, e.g. in coding for data-compression and knowledge driven enhancements. 



4 Switching modes in bi-modal histograms 

Instead of replacing each pixel with a feature of its local histogram, such as the 
median or the maximum mode, we can perform more sophisticated processing if 
we take the structure of the local histograms into account. If this histogram is 
bi-modal, this indicates the presence of multiple “objects” in the neighborhood 
of that location. Noest and Koenderink [5] have suggested to deal with partial 
occlusion in this way. 




0^ 0 *0 0 »00m 00 . 

0 000 ^, 00 0 t '.00 

■000* 00. *t 0 000 




Fig. 1. Left: Text hidden in a sinusoidal background, dimensions 230 x 111, intensities 
in the range [0, 1]. Middle: bi-modal locations in an LOI of <t = 0, /? = 0.15 and a = 1.5. 
Right: bi-modal locations have been replaced with the high mode. Text is removed and 
the background restored. 
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Consider locations with bi-modal histograms. We let locations “switch mode”, 
i.e. if they are in the high mode (that is, their original value is on the right of 
the minimum mode in between the two modes), we replace their value with the 
low mode (or vice versa, depending on the desired effect). The idea is to re- 
place a bright/dark object with the most likely value that the darker/brighter 
object that surrounds it has, namely the low/high maximum mode. Note that 
this a two-step process: the detection of bi-modal locations is a segmentation 
step, and replacing pixels fills in a value from its surroundings, using statistical 
information from only those pixels that belong to the object to be filled in. 

This scheme allows for a scale-selection procedure. For fixed a, (3, a, there may 
be locations with more than two modes in their local distribution. This indicates 
that it is worthwhile to decrease a, focusing on a smaller neighborhood, until just 
two modes remain. Thus we use a locally adaptive a, ensuring that the replaced 
pixel value comes from information from locations “as close as possible” to the 
pixel to be replaced. 

We have applied this scheme successfully for the removal of text on a compli- 
cated background (Figure 1), the detection of dense objects in chest radiographs, 
and noise removal. Figure 2 shows how shot noise can be detected and replaced 
with a probable value, obtained from the local histogram. The restoration is 
near perfect. Figure 3 shows three consecutive frames from an old movie with 
severe deteriorations. To avoid to find bi-modal locations due to movement be- 
tween frames, we considered two LOIs, one in which the frame to be restored 
was the first and one in which it was the last image. Only locations that were 
bi-modal in both cases were taken in consideration. Although most artifact are 
removed, there is ample room for improvements. One can verify from the middle 




Fig. 2. (top-left) Original image, 249 x 188 pixels, intensities scaled to [0,1]; (top- 
middle) Original image with shot noise. This is the input image for the restoration 
procedure; (top-right) Locations in (top-middle) with bi-modal histograms and pixels 
in the lowest mode using a = 0, (3 = 0.04, a — 0.5. (bottom-left) Restoration using 
mode-switching for bi-modal locations gives excellent results; (bottom-right) Restora- 
tion using using 5x5 median filter. This removes most shot noise, but blurs the image. 
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Fig. 3. The top row shows three consecutive frames (337 x 271 pixels, intensities 
scaled to [0, 1]) from a movie with severe local degradations, especially in the first 
frame shown. LOIs were calcnlated with a = 0, P = 0.1, and a = 2.0. The second 
row shows the detected artifact locations for each frame. The bottom row shows the 
restored frames, using histogram mode switching. 



column in Fig. 3 that the hand which makes a rapid movement has been partly 
removed. Distinguishing such movements from deteriorations is in general a very 
complicated task, that would probably require a detailed analysis of the optic 
flow between frames. 

5 Histogram transformations 

Any generic histogram can be transformed into any other histogram by a non- 
linear, monotonic gray-level transformation. To see this, consider an input his- 
togram hi{i) and its cumulative histogram h\{i')di' = Hi{i) and the desired 
output histogram /i 2 (*) and If we replace every i with the i' for which 

Hi(i) = H 2 {i') we have transformed the cumulative histogram Hi into H 2 and 
thus also hi into / 12 . Since cumulative histograms are monotonically increasing, 
the mapping is monotonically increasing as well. 

An example is histogram equalization. When displaying an image with a 
uniform histogram (within a certain range), all available gray levels or colors will 
be used in equal amounts and thus “perceptual contrast” is maximal. The idea to 
use local histograms (that is, selecting a proper a for the LOI) for equalization, 
to obtain optimal contrast over each region in the image stems from the 1970s 
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original a = 8; f3 = 0.4 a = 8; f3 = 0.2 a — 4; f3 = 0.2 



Fig. 4. A normal PA chest radiograph of 512 by 512 pixels with intensities in the range 
[0,1]. (a) Original image, in which details in Inng regions and mediastinum are not 
well visible due to the large dynamic range of gray levels, (b)-(d) Adaptive histogram 
eqnalization (AHE) based on the LOI with cr = 0 and 3 combinations of a and j3. 



[6] and is called adaptive histogram equalization (AHE). However, it was noted 
that these operations blow up noise in homogeneous regions. Pizer et al. [7] 
proposed to clip histograms, viz. for each bin with more pixels than a certain 
threshold, truncate the number of pixels and redistribute these uniformly over 
all other bins. It can be seen that this ad hoc technique amounts to the same 




Fig. 5. Top-left is an image (332 by 259 pixels) of rough concrete viewed frontally, 
illuminated from 22°. Top-right: the same material illnminated from 45°. Bottom-left 
shows the top-left image with its histogram mapped to the top-right image to approxi- 
mate the change in texture. Bottom-right shows the result of local histogram transfor- 
mation, with a = 2. The approximation is especially improved in areas that show up 
white in the images on the left. These areas are often partly shadowed with illumination 
from 45°, and using a local histogram may correctly “predict” such transitions. 





16 



B. van Ginneken and B.M. ter Haar Romeny 




(a) (b) (c) (d) 



Fig. 6. (a) A texture from the Brodatz set [11], resolution 256^, intensities in the range 
[0, 1]. (b) Blurred Gaussian noise, scaled to range from [0, 1]. (c) Multiplication of (a) 
and (b). (d) Reconstruction of (a) from (c) from the LOT with cr = 0,/3 = 0.1, a = 2 
and computing for each point a mapping to the local histogram at the randomly chosen 
location (80,80). 



effect as increasing (3 in the LOI; notably, for (3 — >■ oo, AHE has no effect. Thus 
we see that the two scale parameters a and (3 determine the size of structures 
that are enhanced and the amount of enhancement, respectively. Figure 4 shows 
a practical example of such a continuously tuned AHE for a medical modality 
(thorax X-ray) with a wide latitude of intensities. 

An alternative to histogram equalization is to increase the standard deviation 
of the histogram by a constant factor, which can be done by a linear gray level 
transformation, or variations on such schemes [8]. Again, the LOI provides us 
with an elegant framework in which the scale parameters that determine the 
results of such operations are made explicit. 

Another application of histogram transformation is to approximate changes 
in texture due to different viewing and illumination directions [9]. In general, the 
textural appearance of many common real-world materials is a complex function 
of the light field and viewing position. In computer graphics it is common prac- 
tice, however, to simply apply a projective transformation to a texture patch 
in order to account for a change in viewing direction and to adjust the mean 
brightness using a bi-directional reflection distribution function (BRDF), often 
assumed to be simply Lambertian. In [9] it is shown that this gives poor results 
for many materials, and that histogram transformations often produce far more 
realistic results. A logical next step is to consider local histogram transforma- 
tions. An example is shown in Figure 5, using a texture of rough concrete taken 
from the CURET database [10]. Instead of using one mapping function for all 
pixel intensities, the mapping is now based on the pixel intensity and the inten- 
sities in its surroundings. Physical considerations make clear that this approach 
does make sense: bright pixels which have dark pixels due to shadowing in their 
neighborhood are more likely to become shadowed for more oblique illumination 
than those that are in the center of a bright region. 

Finally, histogram transformations can be applied to restore images that have 
been corrupted by some noise process, but for which the local histogram prop- 
erties are known or can be estimated from the corrupted image. Such cases are 
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encountered frequently in practice. Many image acquisition systems contain arti- 
facts that are hard to correct with calibration schemes. One example in medical 
image processing is the inhomogeneity of the magnetic field of an MR scanner 
or of the sensitivity of MR surface coils, leading to low frequency gradients over 
the image. A generated example is shown in Figure 6 where we multiplied a tex- 
ture image with Gaussian noise. By randomly choosing a point in the corrupted 
image and computing the mapping that transforms each local histogram to the 
local histogram at that particular location we obtain the restored image in Fig- 
ure 6(d). Apart from it being low frequency, the LOI method does not make any 
assumption about the noise process and works for multiplicative, additive, and 
other kinds of noise processes. 



6 Texture classification and discrimination 

LOIs can be used to set up a framework for texture classification. The histogram 
is one of the simplest texture descriptions; the spatial structure has been com- 
pletely disregarded and only the probability distribution remains. This implies 
that any feature derived from LOIs is rotationally invariant. There are several 
ways possible to extend LOIs: 

Locally orderless derivatives 

Instead of using L(x; a) as input for the calculation of LOIs, we can use L^(x; cr), 
which denotes the nth order spatial derivative of the image at scale cr in the di- 
rection 6. These images can be calculated for any 6 from a fixed set of basis 
filters in several ways, for a discussion see [12], [13]. For n = 0, these locally 
orderless derivatives (LODs) reduce to the LOIs. Alternatively, one could choose 
another family of filters instead of directional derivatives of Gaussians, such as 
differences of offset Gaussians [14], [15], or Gabor filters [16]. 

Directional locally orderless images 

Another way to introduce orientation sensitivity in LOIs is to use anisotropic 
Gaussians as local regions of interest. This would extend the construction with 
an orientation 0 < 0 < tt, and an anisotropy factor. 

Cooccurrence matrices 

Haralick [17], [18] introduced cooccurrence matrices, which are joint probability 
densities for locations at a prescribed distance and orientation. Texture features 
can be computed from these matrices. It is straightforward to modify the LOIs 
into a construction equivalent to cooccurrence matrices. It leads to joint proba- 
bility functions as a function of location. 

Results from psychophysics suggest that if two textures are to be pre-attentive 
discriminable by human observers, they must have different spatial average 
f frp^ R(x,y) and f R{x,y) of some locally computed neural response R [14]. 
We use this as a starting point and compute features derived from LODs, av- 
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eraged over texture patches. Averaging will give identical results for any a if 
we use linear operations on LODs to compute local features. Thus we include 
non-linear operations on the local histograms as well. An obvious choice is to 
use higher-order moments. 

Which combinations of scales a, a, f3 are interesting to select? First of all, 
we should have a < a, otherwise local histograms are peaked distributions and 
higher order moments of these distributions are fully predictable. Furthermore, 
in practice a and (3 will often be mutually exclusive. Haralick [18] defined texture 
as a collection of typical elements, called tonal primitives or textons, put together 
according to placement rules. As the scope a is much larger than the spatial size 
of the textons, the local histograms will not change much anymore. Therefore 
it does not make sense to consider more than one LOI with a much larger 
than the texton size. Using a = oo is the obvious choice for this large scope 
histogram. Secondly, if we vary a at values below the texton size, we study the 
spatial structure of the textons. For a much larger than the texton size, we are 
investigating the characteristics of the placement rules. 

We performed an experiment using texture patches from 16 large texture 
images from the USC-SIPI database available at http://sipi.usc.edu, 11 of 
which originated from the Brodatz collection [11]. From each texture, 16 nonover- 
lapping regions were cropped and subsampled to a resolution of 128 x 128. In- 
tensity values of each patch were normalized to zero mean and unit variance. 
Figure 7 shows one patch for each texture class. 

We classified with the nearest-neigbor rule and the leave-one-out method. A 
small set of 9 features was already able to classify 255 out of all 256 textures 
correctly. This set consisted of 3 input images, To(x; cr = 0) (used in feature 1- 
3), if (x; CT = 1) (used in feature 4-6), and Lg° (x; <t = 1) (used in feature 7-9) 
for which we calculated the averaged second moment (viz. the local standard 
deviation) for (3 = 0.1 and a= 1, 2, oo. 

To gain more insight into the discriminative power of each of the calculated 
features separately, we performed the classification for any possible combination 
of 1, 2 or 3 out of the 9 features. The best and worst results are given in Table 
1. It is interesting to see that there is no common feature in the best single set, 
the best 2 features and the 3 best ones, which indicates that all features contain 
discriminant power. Since we use only 2nd moments, features are invariant to 
gray-level inversion. This can be solved by adding higher moments, which was 
apparently unnecessary for the test set we considered. 




Fig. 7. The 16 different textures used in a texture classification experiment. 
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features set 


best single 


worst single 


best 2 


worst 2 


best 3 


worst 3 


full set 


features used 
result 


7 

47.6% 


3 

12.9% 


4,9 

91.4% 


2,3 

41.0% 


1,5,8 

99.2% 


3,8,9 

71.5% 


all 

99.6% 



Table 1. Classification resnlts for various combinations of features. 



7 Texture segmentation based on local histograms 

Many “general” (semi-) automatic segmentation schemes are based on the notion 
that points in spatial proximity with similar intensity values are likely to belong 
to the same object. Such methods have problems with textured areas, because 
the intensity values may show wild local variations. A solution is to locally com- 
pute texture features and replace pixel values with these features, assuming that 
pixels that belong to the same texture region will now have a similar value. The 
framework of LOIs is ideally suited to be used for the computation of such local 
features. One could use LODs, or another extension of LOIs put forward in the 
previous section. Shi and Malik [15] have applied their normalized cut segmen- 
tation scheme to texture segmentation in this way, using local histograms and 
the correlation between them as a metric. 

Here we present an adapted version of a seeded region growing (SRG), that 
is popular in medical image processing. For a — >■ 0, our adaptation reduces to 
a scheme very similar to the original SRG. This is directly due to the fact that 
LOIs contain the original image. 

SRG segments an image starting from seed regions. A list is maintained of 
pixels connected to one of the regions, sorted according to some metric. This 
metric is originally defined as the squared difference between the pxiel intensity 
and the mean intensity of the region. The pixel at the top of the list is added to 




Fig. 8. Top row, from left to right: A 256 x 128 test image composed of two ho- 
mogenous regions with intensity 0 and 1 and Gaussian noise with zero mean and unit 
variance. An LOI with a — 0 and f3 = 0.2 and a = 0, 1, 4, respectively is used for seeded 
region growing from the two seeds shown in white. Since the mean of the two region is 
different, regular seeded region growing (a = 0) works well. Bottom row: same proce- 
dnre for a partly textured image; the left half was filled with sin(a;/3) -I- sin(j//3), the 
right half was set to zero, and Gaussian noise with zero mean and a = 0.5 was added. 
Regular seeded region growing now fails, but if a is large enough, the segmentation is 
correct. 
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Fig. 9. Left: A test image composed of 6 texture patches of pixel size 128 x 128 each. 
Intensity values In each patch are normalized to zero mean and unit variance. Right: 
The result of segmentation with seeded region growing based on a LOI with a = 0, 
P = 0.2 and a = 8. The circles are the seeds. 



the region it is connected to, and the neighbors of the added pixel are added to 
the list. This procedure is repeated until all pixels are assigned to a region. 

We propose to compute a metric based on the local histograms of a pixel and 
a region. We subtract the histograms and take the sum of the absolute values 
of what is left in the bins. For a — >■ 0 this reduces to a scheme similar to the 
original scheme, except that one considers for the region the global mode instead 
of the mean (most likely pixel value instead of the mean pixel value). Figures 8 
to 10 illustrate the use of seeded region growing based on local histograms. 




Fig. 10. Top row, left: Wildlife scene with leopard, size 329 x 253 pixels, intensities 
scaled between [0,1]; Bottom row, left: A locally (u = 8) normalized version of the 
input image; Middle and right: Segmentation by SRG based upon LOI with <t = 0, 
P = 0.05 and a = 0, 4, respectively. Note how well the textured area is segmented in 
the lower right image. 
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8 Concluding remarks 

In the applications presented, we have used many aspects of LOIs. They are a 
natural extension of techniques that usually use pixels, e.g. seeded region grow- 
ing. They extend techniques that use “conventional” histograms with an extra 
degree of freedom, e.g. histogram transformation techniques. Other applications 
exploit the behavior of LOIs over scale to obtain non-linear diffusions, for scale 
selection in noise removal, and to derive texture features. We conclude that LOIs 
are image representations of great practical value. 
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Abstract. Indexing large archives of historical manuscripts, like the pa- 
pers of George Washington, is required to allow rapid perusal by scholars 
and researchers who wish to consult the original manuscripts. Presently, 
such large archives are indexed manually. Since optical character recog- 
nition (OCR) works poorly with handwriting, a scheme based on match- 
ing word images called word spotting has been suggested previously for 
indexing such documents. The important steps in this scheme are seg- 
mentation of a document page into words and creation of lists containing 
instances of the same word by word image matching. 

We have developed a novel methodology for segmenting handwritten 
document images by analyzing the extent of “blobs” in a scale space 
representationof the image. We believe this is the hrst application of 
scale space to this problem. The algorithm has been applied to around 30 
grey level images randomly picked from different sections of the George 
Washington corpus of 6,400 handwritten document images. An accuracy 
of 77 — 96 percent was observed with an average accuracy of around 
87 percent. The algorithm works well in the presence of noise, shine 
through and other artifacts which may arise due aging and degradation 
of the page over a couple of centuries or through the man made processes 
of photocopying and scanning. 



1 Introduction 

There are many single author historical handwritten manuscripts which would 
be useful to index and search. Examples of these large archives are the papers 

* This material is based on work supported in part by the National Science Founda- 
tion, Library of Congress and Department of Commerce under cooperative agree- 
ment number EEC-9209623, in part by the United States Patent and Trademark 
Office and Defense Advanced Research Projects Agency/ITO under ARPA order 
number D468, issued by ESC/AXS contract number F19628-95-C-0235, in part by 
the National Science Foundation under grant number IRI-9619117, in part by NSF 
Multimedia CDA-9502639 and in part by the Air Force Office of Scientihc Research 
under grant number F49620-99- 1-0138. Any opinions, findings and conclusions or 
recommendations expressed in this material are the authors and do not necessarily 
reflect those of the sponsors. 
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of George Washington, Margaret Sanger and W. E. B Dubois. Currently, much 
of this work is done manually. For example, 50,000 pages of Margaret Sanger’s 
work were recently indexed and placed on a CDROM. A page by page index 
was created manually. It would be useful to automatically create an index for an 
historical archive similar to the index at the back of a printed book. To achieve 
this objective a semi-automatic scheme for indexing such documents have been 
proposed in [8]. In this scheme known as Word Spotting the document page is 
segmented into words. Lists of words containing multiple instances of the same 
word are then created by matching word images against each other. A user then 
provides the ASCII equivalent to a representative word image from each list 
and the links to the original documents are automatically generated. The earlier 
work in [8] concentrated on the matching strategies and did not address full page 
segmentation issues in handwritten documents. In this paper, we propose a new 
algorithm for word segmentation in document images by considering the scale 
space behavior of blobs in line images. 

Most existing document analysis systems have been developed for machine 
printed text. There has been little work on word segmentation for handwritten 
documents. Most of this work has been applied to special kinds of pages - for 
example, addresses or “clean” pages which have been written specifically for 
testing the document analysis systems. Historical manuscripts suffer from many 
problems including noise, shine through and other artifacts due to aging and 
degradation. No good techniques exist to segment words from such handwrit- 
ten manuscripts. Further, scale space techniques have not been applied to this 
problem before. ^ We outline the various steps in the segmentation algorithm 
below. 

The input to the system is a grey level document image. The image is pro- 
cessed to remove horizontal and vertical line segments likely to interfere with 
later operations. The page is then dissected into lines using projection analysis 
techniques modified for gray scale image. The projection function is smoothed 
with a Gaussian filter (low pass filtering) to eliminate false alarms and the po- 
sitions of the local maxima (i.e. white space between the lines) is detected. Line 
segmentation, though not essential is useful in breaking up connected ascenders 
and descenders and also in deriving an automatic scale selection mechanism. 
The line images are smoothed and then convolved with second order anisotropic 
Gaussian derivative filters to create a scale space and the blob like features which 
arise from this representation give us the focus of attention regions (i.e. words in 
the original document image). The problem of automatic scale selection for filter- 
ing the document is also addressed. We have come up with an efficient heuristic 
for scale selection whereby the correct scale for blob extraction is obtained by 
finding the scale maxima of the blob extent. A connected component analysis of 
the blob image followed by a reverse mapping of the bounding boxes allows us to 
extract the words. The box is then extended vertically to include the ascenders 
and descenders. Our approach to word segmentation is novel as it is the first 

^ It is interesting to note that the first scale space paper by T. lijima was written in 
the context of optical character recognition in 1962 (see [12]). However, scale space 
techniqes are rarely used in document analysis today and as far as we are aware it 
has not been applied to the problem of character and word segmentation. 
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algorithm which utilizes the inherent scale space behavior of words in grey level 
document images. This paper gives a brief description of the techniques used. 
More details may be found in [11]. 



1.1 Related Work 

Character segmentation schemes proposed in the literature have mostly been 
developed for machine printed characters and work poorly when extended to 
handwritten text . An excellent survey of the various schemes has been presented 
in [3]. Very few papers have dealt exclusively with the issue of word segmenta- 
tion in handwritten documents and most of these have focussed on identifying 
gaps using geometric distance metrics between connected components. Seni and 
Cohen [9] evaluate eight different distance measures between pairs of connected 
component for word segmentation in handwritten text. In [7] the distance be- 
tween the convex hulls is used. Srihari et all [10] present techniques for line 
separation and then word segmentation using a neural network. However, exist- 
ing word segmentation strategies have certain limitations. 

1. Almost all the above methods require binary images. Also, they have been 
tried only on clean white self-written pages and not manuscripts. 

2. Most of the techniques have been developed for machine printed characters 
and not handwritten words. The difficulty faced in word segmentation is in 
combining discrete characters into words. 

3. Most researchers focus only on word recognition algorithms and considered 
a database of clean images with well segmented words (see for example [1]). 
Only a few [10] have performed full, handwritten page segmentation. How- 
ever, we feel that schemes such as [10] are not applicable for page segmen- 
tation in manuscript images for the reasons mentioned below. 

4. Efficient image binarization is difficult on manuscript images containing noise 
and shine through. 

5. Connected ascenders and descenders have to be separated. 

6. Prior character segmentation was required to perform word segmentation 
and accurate character segmentation in cursive writing is a difficult problem. 
Also the examples shown are contrived (self written) and do not handle 
problems in naturally written documents. 

2 Word Segmentation 

Modeling the human cognitive processes to derive a computational methodology 
for handwritten word segmentation with performance close to the human visual 
system is quite complex due to the following characteristics of handwritten text. 

1. The handwriting style may be cursive or discrete. In case of discrete hand- 
writing characters have to be combined to form words. 

2. Unlike machine printed text, handwritten text is not uniformly spaced. 

3. Scale problem. For example, the size of characters in a header is generally 
larger than the average size of the characters in the body of the document. 
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4. Ascenders and descenders are frequently connceted and words may be present 
at different orientations. 

5. Noise, artifacts, aging and other degradation of the document. Another prob- 
lem is the presence of background handwriting or shine through. 

We now present a brief background to scale space and how we have applied it 
to document analysis. 

2.1 Scale Space and Document Analysis 

Scale space theory deals with the importance of scale in any physical observation 
i.e. objects and features are relevant only at particular scales. In scale space, 
starting from an original image, successively smoothed images are generated 
along the scale dimension. It has been shown by several researchers [4,6] that 
the Gaussian uniquely generates the linear scale space of the image when certain 
conditions are imposed. 

We feel that seale space also provides an ideal framework for document anal- 
ysis. We may regard a document to be formed of features at multiple scales. 
Intuitively, at a finer scale we have characters and at larger scales we have 
words, phrases, lines and other structures. Hence, we may also say that there 
exists a scale at which we may derive words from a document image. We would, 
therefore, like to have an image representation which makes the features at that 
scale (words in this case) explicit. The linear scale space representation of a con- 
tinuous signal with arbitrary dimensions consists of building a one parameter 
family of signals derived from the original one in which the details are progres- 
sively removed. Let / : — >■ 5ft represent any given signal. Then, the scale space 

representation / : x 3ft+ — >■ 3ft is defined by (see [6]) letting the scale space 

representation at zero scale be equal to the original signal /(•; 0) = f and for 
t > 0, 

= ( 1 ) 

where t G 5ft+ is the scale parameter, and G is the Gaussian kernel which in two 
dimensions (x,y G 3ft) is written as 

G{x, y; a) = (2) 

where a = \j2t. We now describe the various stages in our algorithm. 

2.2 Preprocessing 

These handwritten manuscripts have been subjected to degradation such as fad- 
ing and introduction of artifacts. The images provided to us are scanned versions 
of the photocopies of the original manuscripts. In the process of photocopying, 
horizontal and vertical black line segments/margins were introduced. Horizontal 
lines are also present within the text. The purpose of the preprocessing step is to 
remove some of these margins and lines so that they will not interfere with the 
blob analysis stage. Due to lack of space, this step is not described here. More 
details may be found in [11]. 
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2.3 Line Segmentation 

Line segmentation allows the ascenders and descenders of consecutive lines to be 
separated. In the manuscripts it is observed that the lines consist of a series of 
horizontal components from left to right. Projection profile techniques have been 
widely used in line and word segmentation for machine printed documents [5]. 
In this technique a ID function of the pixel values is obtained by projecting the 
binary image onto the horizontal or vertical axis. We use a modified version of the 
same algorithm extended to gray scale images. Let /(x, y) be the intensity value 
of a pixel (x, y) in a gray scale image. Then, we define the vertical projection 
profile as 

w 

P{y) = ^f(.x,y) (3) 

a :— 0 

where W is the width of the image. Fig. 1 shows a section of an image in (a) and 
its projection profile in (b) . The distinct local peaks in the profile corresponds to 
the white space between the lines and distinct local minima corresponds to the 
text (black ink). Line segmentation, therefore, involves detecting the position of 
the local maxima. However, the projection profile has a number of false local 
maxima and minima. The projection function P{y) is therefore, smoothed with 
a Gaussian (low pass) filter to eliminate false alarms and reduce sensitivity to 
noise. A smoothed profile is shown in (c). The local maxima is then obtained 
from the first derivative of the projection function by solving for y such that : 

P'{y) = P{y)^Gy = Q (4) 

The line segmentation technique is robust to variations in the size of the lines 
and has been tested on a wide range of handwritten pages. The next step after 
line segmentation is to create a scale space of the line images for blob analysis. 



2.4 Blob Analysis 

Now we examine each line image individually to extract the words. A word image 
is composed of discrete characters, connected characters or a combination of the 
two. We would like to merge these sub-units into a single meaningful entity 
which is a word. This may be achieved by forming a blob-like representation of 
the image. A blob can be regarded as a connected region in space. The traditional 
way of forming a blob is to use a Laplacian of a Gaussian (LOG) [6], as the LOG 
is a popular operator and frequently used in blob detection and a variety of multi- 
scale image analysis tasks [2,6]. We have used a differential expression similar to 
a LOG for creating a multi-scale representation for blob detection. However, our 
differential expression differs in that we combine second order partial Gaussian 
derivatives along the two orientations at different scales. In the next section we 
present the motivation for using an anisotropic derivative operator. 



Non Uniform Gaussian Filters. In this section some properties which char- 
acterize writing are used to formulate an approach to filtering words. In [6] 
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Fig. 1: (a) A section of an image, (b) projection profile, (c) smoothed projection 
profile (d) line segmented image 



Lindeberg observes that maxima in scale-space occur at a scale proportional to 
the spatial dimensions of the blob. If we observe a word we may see that the 
spatial extent of the word is determined by the following : 



1. The individual characters determine the height (y dimension) of the word 
and 

2. The length (x dimension) is determined by the number of characters in it. 

A word generally contains more than one character and has an aspect ratio 
greater than one. As the x dimension of the word is larger than the y dimen- 
sion, the spatial filtering frequency should also be higher in the y dimension 
as compared to the x dimension. This domain specific knowledge allows us to 
move from isotropic (same scale in both directions) to anisotropic operators. We 
choose the x dimension scale to be larger than the y dimension to correspond to 
the spatial structure of the word. We define the anisotropic Gaussian filter as 



G{x,y; a^,ay) 




27T 



) 



( 5 ) 



We may also define the multiplication factor rj hy rj = ^ . 

In the scale selection section we will show that the average aspect ratio or 
the multiplication factor 77 lies between three and five for most of the handwrit- 
ten documents available to us. Also the response of the anisotropic Gaussian 
filter (measured as the spatial extent of the blobs formed) is maximum in this 
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range. For the above Gaussian, the second order anisotropic Gaussian differential 
operator L{x,y;ax,ay) is defined as 

L{x, y; ax, ay) = Gxx{x, y; ax, ay) + Gyy{x, y; ax, ay) (6) 

A scale space representation of the line images is constructed by convolving the 
image with 6. Gonsider a two dimensional image f(x, y), then the corresponding 
output image is 

I{x,y;ax,ay) = L{x,y; ax, ay) -k f{x,y) (7) 

The main features which arise from a scale space representation are blob-like 
(i.e. connected regions either brighter or darker than the background). The sign 
of I may then be used to make a classification of the 3-D intensity surface into 
foreground and background. For example consider the line image in Fig. 2(a). 
The figures show the blob images I{x,y; ax, ay) at increasing scale values. Fig. 
2(b) shows that at a lower scale the blob image consists of character blobs. As 
we increase the scale, character blobs give rise to word blobs (Fig. 2(c) and 
Fig. 2(d)). This is indicative of the phenomenon of merging in blobs. It is seen 
that for certain scale values the blobs and hence the words are correctly delin- 
eated (Fig. 2(d)). A further increase in the scale value may not necessarily cause 
word blobs to merge together and other phenomenon such as splitting is also 
observed. These figures show that there exists a scale at which it is possible to 
delineate most words. In the next section we present an approach to automatic 
scale selection for blob extraction. 



(a) A line image 






(c) Blob image at scale Oy — 
2, (Tj; = 4 
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(b) Blob image at scale Oy — 
1,0-2: = 2 




(d) Blob image at scale Oy = 
4, Ox = 16 




(e) Blob image at scale Oy = 
6, Ox = 36 



Fig. 2: A line image and the output at different scales 
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2.5 Choice of Scale 

Scale space analysis does not address the problem of scale selection. The solution 
to this problem depends on the particular application and requires the use of 
prior information to guide the scale selection procedure. Some of our work in 
scale selection draws motivation from Lindeberg’s observation [6] that the max- 
imum response in both scale and space is obtained at a scale proportional to the 
dimension of the object. A document image consists of structures such as char- 
acters, words and lines at different scales. However, as compared to other types 
of images, document images have the unique property that a large variation in 
scale is not required to extract a particular type of structure. For example, all 
the words are essentially close together in terms of their scale and can, therefore, 
be extracted without a large variation in the scale parameter. Hence, there ex- 
ists a scale where each of the individual word forms a distinct blob. The output 
(blob) is then maximum at this value of the scale parameter. We will show that 
this scale is a function of the vertical dimension of the word if the aspect ratio 
is fixed. 

Now, we highlight, the important differences in Lindeberg’s approach to blob 
analysis and our work. In [6] Lindeberg determines interesting scale levels from 
the maxima over scale levels of a blob measure. He defines his blob measure to 
consist of the spatial extent, contrast and lifetime. A scale space blob tree is 
then constructed to track individual blobs across scales. In our analysis tracking 
individual blobs across scales is not the relevant issue nor is it computation- 
ally advisable because of the presence of a large number of blobs representing 
characters and words. Also it is impossible to determine whether an extrema 
corresponds to a character blob or a word blob and as mentioned earlier the 
variation of the best scale for a word is not large. What is important, however, is 
that we would like to merge character blobs and yet be able to delimit the word 
blobs. Therefore, we consider a blob as a connected region in space and measure 
its spatial extent but do not give it any volumetric significance. Spatial extent 
as a blob characteristic is computationally available to us and we observe that 
it shifts with scale giving a maximum as character blobs merge to form word 
blobs. This is in agreement with the intuitive reasoning that the response of the 
word at the correct scale of observation should be maximum as every blob has 
only a range of scales (lifetime) to manifest itself. 

Our algorithm requires selecting Gy and the multiplication factor ry for blob 
extraction. We present an analysis which helped us arrive at a simple scale 
selection method based on the observation that the maximum of the spatial 
extent of the blobs corresponds to the best scale for filtering. To measure the 
variation in spatial extent of the blobs over scale we define Q to represent the 
extent of a blob i. Then the total extent of the blobs A, for a line is given by 

a=e:=iO- 



Selecting rj. The parameters Gy and Gx try to capture the spatial dimensions 
of a word. An important characteristic of a word is its aspect ratio. A manual 
analysis of several images was carried out and it was shown that the average 
aspect ratio of a word in a document image is approximately 3.0 — 5.0. We had 




30 



R. Manmatha and N. Srimal 



earlier defined the multiplication factor r\ os, r\ = An analysis of several 

images reveals that for constant Uy, the maxima in extent was obtained for 77 
lying in the range between 3 — 5. A line image and the corresponding plot is 
shown in Fig. 3. In this Fig. the maximum is obtained in the region between 
3.5 — 4. This analysis along with the observation that the average aspect ratio 




(a) A line image (b) Plot of extent vs rj, Uy — 2 for 

above image, a maximum is obtained 
at 77 = 3.8 



Fig. 3: Variation of blob extent vs 77 with constant Uy 



of the word is between 3 — 5 allows us to choose a value of 77 in the range 3 — 5. 
Specifically, for further analysis we choose 77 = 4. 

Selecting <Ty. Fig. 4 shows the line images and corresponding plots of extent 
versus Uy for constant 77 . As seen in the figures the total extent exhibits a peak 
which depends on tjy. The figures also show how the peak shifts with the change 
in the size (height) of the characters. Experimentally it was found that ay (y 
scale) is a function of the height of the words (which is related to the height of 
the line). An estimate of ay is obtained by using the line height i.e. 

ay = k X Line height ( 8 ) 

where 0 < fc < 1. The nearby scales are then examined to determine the max- 
imum over scales. For our specific implementation we have used /c = 0.1 and 
sampled ay at intervals of 0.3. The two values were determined experimentally 
and worked well over a wide range of images. 

2.6 Blob Extraction and Post Processing 

The blobs are then mapped back to the original image to locate the words. 
A widely used procedure is to enclose the blob in a bounding box which can 
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(a) A sample line image with smaller 
height 




(c) Plot of extent vs <Ty, Maximum is 
obtained at cry = 2.5 



(b) A sample line image with larger 
height 




(d) Plot of extent vs <Ty, Maximum 
is obtained at cry = 6.0 



Fig. 4: Variation of blob extent vs Gy with constant ry = 4. 



be obtained through connected component analysis. In a blob representation of 
the word, localization is not maintained. Also parts of the words, especially the 
ascenders and descenders, are lost due to the earlier operations of line segmenta- 
tion and smoothing (blurring). Therefore, the above bounding box is extended 
in the vertical direction to include these ascenders and descenders. At this stage 
an area/ratio filter is used to remove small structures due to noise. 

3 Results 

The technique was tried on 30 randomly picked images from different sections of 
the George Washington corpus of 6, 400 images and a few images from the archive 
of papers of Erasmus Hudson. To reduce the run-time, the images have been 
smoothed and sub-sampled to a quarter of their original size. The algorithm takes 
120 seconds to segment a document page of size 800 x 600 pixels on a PC with a 
200 MHz Pentium processor running LINUX. A segmentation accuracy ranging 
from 77 — 96 percent with an average accuracy around 87.6 percent was observed. 
Fig. 5 shows part of a segmented page image with bounding boxes drawn on the 
extracted words. The method worked well even on faded, noisy images and Table 

4 shows the results averaged over a set of 30 images. The first column indicates 
the average no. of distinct words in a page as seen by a human observer. The 
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Fig. 5: Segmentation result on part of image 1670165.tif from the George Wash- 
ington collection 



second column indicates the % of words detected by the algorithm i.e, words 
with a bounding box around them, this includes words correctly segmented, 
fragmented and combined together. The next column indicate the % of words 
fragmented. Word fragmentation occurs if a character or characters in a word 
have separate bounding boxes or if 50 percent or greater of a character in a word 
is not detected. Line fragmentation occurs due to the dissection of the image 
into lines. A word is line fragmented if 50 percent or greater of a character lies 
outside the top or bottom edges of the bounding box. The sixth column indicates 
the words which are combined together. These are multiple words in the same 
bounding box. The last column gives the percentage of correctly segmented 
words. 



4 Conclusion 

We have presented a novel technique for word segmentation in handwritten doc- 
uments. Our algorithm is robust and efficient for the following reasons: 

1. We use grey level images and, therefore, image binarization is not required. 
Image binarization requires careful pre-selection of the threshold and gen- 
erally results in a loss of information. The threshold parameter has to be 
selected locally and is very sensitive to noise, fading and other phenomenon. 

2. Since the images are heavily smoothed, insignificant blobs can easily be elim- 
inated. Therefore, the technique is comparatively unaffected by the presence 
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of speckles which otherwise would have greatly affected techniques requiring 
binarization as the first step. 

3. One of the major advantages of our approach is that the scheme is largely un- 
affected by shine through. This is because the algorithm is based on blurring 
and the information is extracted in the form of blobs. 

4. The algorithm makes minimal assumptions about the nature of handwriting 
and fonts and may be extended to word segmentation in other language 
documents where words are delineated by spaces. Also, the method does not 
require prior training. 



No. of 
documents 


Average no. of 
words per image 


% words 
detected 


% fragmented 
words 4- line 


% words 
combined 


% words correctly 
correctly 


30 


220 


99.12 


1.75 -t 0.86 


8.9 


87.6 



Table 1. Table of segmentation results 
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Abstract. We use an unconditionally stable numerical scheme to im- 
plement a fast version of the geodesic active contour model. The proposed 
scheme is useful for object segmentation in images, like tracking moving 
objects in a sequence of images. The method is based on the Weickert- 
Romeney-Viergever [33] AOS scheme. It is applied at small regions, mo- 
tivated by Adalsteinsson-Sethian [1] level set narrow band approach, and 
uses Sethian’s fast marehing method [26] for re-initialization. Experimen- 
tal results demonstrate the power of the new method for tracking in color 
movies. 

1 Introduction 

An important problem in image analysis is object segmentation. It involves the 
isolation of a single object from the rest of the image that may include other 
objects and a background. Here, we focus on boundary detection of one or several 
objects by a dynamic model known as the ‘geodesic active contour’ introduced 
in [4, 5, 6, 7], see also [18,28]. 

Geodesic active contours were introduced as a geometric alternative for ‘snakes’ 
[30,17]. Snakes are deformable models that are based on minimizing an energy 
along a curve. The curve, or snake, deforms its shape so as to minimize an 
‘internal’ and ‘external’ energies along its boundary. The internal part causes 
the boundary curve to become smooth, while the external part leads the curve 
towards the edges of the object in the image. 

In [2,21], a geometric alternative for the snake model was introduced, in which 
an evolving curve was formulated by the Osher-Sethian level set method [22] . The 
method works on a fixed grid, usually the image pixels grid, and automatically 
handles changes in the topology of the evolving contour. 

The geodesic active contour model was born latter. It is both a geometric 
model as well as energy functional minimization. In [4,5], it was shown that the 
geodesic active contour model is related to the classical snake model. Actually, a 
simplified snake model yields the same result as that of a geodesic active contour 
model, up to an arbitrary constant that depends on the initial parameterization. 
Unknown constants are an undesirable property in most automated models. 

Although the geodesic active contour model has many advantages over the 
snake, its main drawback is its non-linearity that results in inefficient imple- 
mentations. For example, explicit Euler schemes for the geodesic active contour 
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limit the numerical step for stability. In order to overcome these limitations, a 
multi-resolution approach was used in [32], and coupled with some additional 
heuristic steps, as in [23] , like computationally preferring areas of high energy. 

In this paper we introduce a new method that maintains the numerical con- 
sistency and makes the geodesic active contour model computationally efficient. 
The efficiency is achieved by canceling the limitation on the time step in the 
numerical scheme, by limiting the computations to a narrow band around the 
the active contour, and by applying an efficient re-initialization technique. 



2 Prom snakes to geodesic active contours 



Snakes were introduced in [17,30] as an active contour model for boundary seg- 
mentation. The model is derived by a variational principle from a non-geometric 
measure. The model starts from an energy functional that includes ‘internal’ and 
‘external’ terms that are integrated along a curve. 

Let the curve C{p) = {x{p),y{p}}, where p G [0, 1] is an arbitrary parameter- 
ization. The snake model is defined by the energy functional 

5[C]= [ {\Cp\^ + a\Cpp\'^ + 2Pg{C))dxdy, 

JO 

where Cp = {dpx{p), dpy{p)}, and a and /3 are positive constants. 

The last term represents an external energy, where g() is a positive edge 
indicator function that depends on the image, it gets small values along the edges 
and higher values elsewhere. Taking the variational derivative with respect to 
the curve, 6S[C]/6C, we obtain the Euler Lagrange equations 






pVg = 0 . 



One may start with a curve that is close to a significant local minimum of iSp], 
and use the Euler Lagrange equations as a gradient descent process that leads 
the curve to its proper position. Formally, we add a time variable t, and write 
the gradient descent process as dtC = 5S[C]/5C, or explicitly 

— ^pp o^Cpppp -l3Vg. 

The snake model is a linear model, and thus an efficient and powerful tool for 
object segmentation and edge integration, especially when there is a rough ap- 
proximation of the boundary location. There is however an undesirable property 
that characterizes this model. It depends on the parameterization. The model is 
not geometric. 

Motivated by the theory of curve evolution, Caselles et al. [2] and Malladi 
et al. [21] introduced a geometric flow that includes an internal and external 
geometric measures. Given an initial curve Cq, the geometric flow is given by the 
planar curve evolution equation Ct = g{C){K — v)Af, where, Af is the normal to 
the curve, nAf is the curvature vector, v is an arbitrary constant, and g{), as 
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before, is an edge indication scalar function. This is a geometric flow, that is, free 
of the parameterization. Yet, as long as g does not vanish along the boundary, the 
curve continues its propagation and may skip its desired location. One remedy, 
proposed in [21], is a control procedure that monitors the propagation and sets 
g to zero as the curve gets closer to the edge. 

The geodesic active contour model was introduced in [4, 5, 6, 7], see also [18,28], 
as a geometric alternative for the snakes. The model is derived from a geomet- 
ric functional, where the arbitrary parameter p is replaced with a Euclidean 
arclength ds = \Cp\dp. The functional reads 

5[C]= j\a + g{C))\Cp\dp. 

It may be shown to be equivalent to the arclength parameterized functional 



5[C] = 



fL(C) 



g{C)ds + aL{C), 



where L(C) is the total Euclidean length of the curve. One may equivalently 
define g{x, y) = g{x, y) + a, in which case 



5[C] = 




g{C)ds, 



i.e. minimization of the modulated arclength g{C)ds. The Euler Lagrange equa- 
tions as a gradient descent process is 

^ = {g{C)K- {Vg,Af))Af. 

Again, internal and external forces are coupled together, yet this time in a way 
that leads towards a meaningful minimum, which is the minimum of the func- 
tional. One may add an additional force that comes from an area minimization 
term, and known as the balloon force [10]. This way, the contour may be directed 
to propagate outwards by minimization of the exterior. The functional with the 
additional area term reads 



/■L(C) !■ 

S'[C] = / g{C)ds + a da, 

Jo Jc 

where da is an area element, for example, /c da = /g Af x Cds. The Euler 
Lagrange as steepest descent is 

^ = ( 5 (C)k- {Vg,Af) -a)Af. 

We can use our freedom of parameterization in the gradient descent flow and 
multiply the right hand side again by an edge indicator, e.g. g. The geodesic 
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active contour model with area as a balloon force modulated by an edge indicator 
is 

^ = {g{C)K - {Vg,Af) - a) g{C)Af. 

The connection between classical snakes, and the geodesic active contour 
model was established in [5] via Maupertuis’ Principle of least action [12]. By 
Fermat’s Principle, the final geodesic active contours are geodesics in an isotropic 
non-homogeneous medium. 

Recent applications of the geodesic active contours include 3D shape from 
multiple views, also known as shape from stereo [13], segmentation in 3D movies 
[19], tracking in 2D movies [23], and refinement of efficient segmentation in 3D 
medical images [20]. The curve propagation equation is just part of the whole 
model. Subsequently, the geometric evolution is implemented by the Osher- 
Sethian level set method [22] . 



2.1 Level set method 



The Osher-Sethian [22] level set method considers evolving fronts in an implicit 
form. It is a numerical method that works on a fixed coordinate system and 
takes care of topological changes of the evolving interface. 

Consider the general geometric planar curve evolution 

where V is any intrinsic quantity, i.e., V does not depend on a specific choice of 
parameterization. Now, let <p{x,y) : ^ IR be an implicit representation of C, 

such that C = {{x,y) : 4>{x,y) = 0}. One example is a distance function from C 
defined over the coordinate plane, with negative sign in the interior and positive 
in the exterior of the closed curve. 

The evolution for (j) such that its zero set tracks the evolving contour is given 

by 

f = 

This relation is easily proven by applying the chain rule, and using the fact that 
the normal of any level set, (j) = constant, is given by the gradient of <f>, 

^ = V (v<p, = v\w^\. 



This formulation enable us to implement curve evolution on the x, y fixed 
coordinate system. It automatically handles topological changes of the evolving 
curve. The zero level set may split from a single simple connected curve, into 
two separate curves. 

Specifically, the corresponding geodesic active contour model written in its 
level set formulation is given by 
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Including an area minimization term that yields a constant velocity, modulated 
by the edge indication function (by the freedom of parameterization of the gra- 
dient descent), we have 

f = 9(*. y) (o + div (»(»- 9) ^) ) I V.AI- 

We have yet to determine a numerical scheme and an appropriate edge in- 
dication function g. An explicit Euler scheme with forward time derivative, in- 
troduces a numerical limitation on the time step needed for stability. Moreover, 
the whole domain needs to be updated each step, which is a time consuming 
operation for a sequential computer. The narrow band approach overcomes the 
last difficulty by limiting the computations to a narrow strip around the zero 
set. First suggested by Chopp [9], in the context of the level set method, and 
later developed in [1], the narrow band idea limits the computation to a tight 
strip of few grid points around the zero set. The rest of the domain serves only 
as a sign holder. As the curve evolves, the narrow band changes its shape and 
serves as a dynamic numerical support around the location of the zero level set. 

2.2 The AOS scheme 

Additive operator splitting (AOS) schemes were introduced by Weickert et al. 
[33] as an unconditionally stable numerical scheme for non-linear diffusion in 
image processing. Let us briefly review its main ingredients and adapt it to our 
model. 

The original AOS model deals with the Perona-Malik [24], non-linear image 
evolution equation of the form dtu = div (gdVuDVu), given initial condition 
as the image m(0) = Uq. Let us re-write explicitly the right hand side of the 
evolution equation 

m 

div (6f(|Vu|)Vu) = ^ {g{\Vu\)d^^u ) , 

1=1 

where I is an index running over the m dimensions of the problem, e.g., for a 2D 
image m = 2,xi = x, and X 2 = y- 

As a first step towards discretization consider the operator 

Mu>^) = d,,g{\Vu'^\)d,„ 

where the superscript k indicates the iteration number, e.g., = uq. We can 

write the explicit scheme 

m 
1=1 

where, r is the numerical time step. It requires an upper limit for r if one desires 
to establish convergence to a stable steady state. Next, the semi-implicit scheme 

m 

I-rJ2Mu^) 

1^1 
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is unconditionally stable, yet inverting the large bandwidth matrix is a compu- 
tationally expensive operation. 

Finally, the consistent, first order, semi-implicit, additive operator splitting 
scheme 






I m 



1=1 



may be applied to efficiently solve the non-linear diffusion. 

The AOS semi-implicit scheme in 2D is then given by a linear tridiagonal 
system of equations 



1=1 

where Ai{u^) is a matrix corresponding to derivatives along the Fth coordinate 
axis. It can be efficiently solved for by Thomas algorithm, see [33]. 

In our case, the geodesic active contour model is given by 

dt(j) = div (^5 (|VmoI) 1 ^^ |V^|, 

where uq is the image, and (j) is the implicit representation of the curve. Since 
our interest is only at the zero level set of (j), we can reset ^ to be a distance 
function every numerical iteration. One nice property of distance maps is it unit 
gradient magnitude almost everywhere. Thereby, the short term evolution for 
the geodesic active contour given by a distance map, with |V(/)| = 1, is 

dt4> = div (g(|Vuo|)V(/)) . 

Note, that now Ai{(f)^) = Ai{uo), which means that the matrices [I—2tAi{uo)]~^ 
can be computed once for the whole image. Yet, we need to keep the </> function as 
a distance map. This is done through re-initialization by Sethian’s fast marching 
method every iteration. 

It is simple to introduce a ‘balloon’ force to the scheme. The resulting AOS 
scheme with the ‘balloon’ then reads 

1 . ^ 

= 2 “ ‘^rg{uo)Ai{uo)]~\(l)'^ + Tag{uo)), 

^ 1=1 

where a is the area/balloon coefficient. 

In order to reduce the computational cost we use a multi-scale approach 
[16]. We construct a Gaussian pyramid of the original image. The algorithm is 
first applied at the lower resolution. Next, the zero set is embedded at a higher 
resolution and the (j) distance function is computed. Moreover, the computations 
are performed only within a limited narrow band around the zero set . The narrow 
band automatically modifies its shape as we re-initiate the distance map. 
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2.3 Re- initialization by the fast marching method 

In order to maintain sub-grid accuracy, we detect the zero level set curve with 
sub-pixel accuracy. We apply a linear interpolation in the four pixel cells in which 
(f) changes its sign. The grid points with the exact distance to the zero level set 
are then used to initialize the fast marching method. 

Sethian’s fast marching method [27,26], is a computationally optimal numer- 
ical method for distance computation on rectangular grids. The method keeps 
a front of updated points sorted in a heap structure, and constructs a numer- 
ical solution iteratively, by fixing the smallest element at the top of the heap 
and expanding the solution to its neighboring grid points. This method enjoys a 
computational complexity bound of 0(fVlog A^), where N is the number of grid 
points in the narrow band. See also [8,31], where consistent 0{N log N) schemes 
are used to compute distance maps on rectangular grids. 

3 Edge indicator functions for color and movies 

Paragios and Deriche [23] , introduced a probability based edge indicator function 
for movies. In this paper we have chosen the geometric philosophy to extract an 
edge indicator. What is a proper edge indicator for color images? Several gener- 
alizations for the gradient magnitude of gray level images were proposed, see e.g. 
[11,25,29]. Here we consider a measure suggested by the Beltrami framework in 
[29], to construct an edge indicator function. 



3.1 Edges in Color 

According to the Beltrami framework, a color image is considered as a two 
dimensional surface in the five dimensional spatial-spectral space. The metric 
tensor is used to measure distances on the image manifold. The magnitude of 
this tensor is an area element of the color image surface, which can be consid- 
ered as a generalization of the gradient magnitude. Formally, the metric tensor 
of the 2D image given by the 2D surface {x,y,R{x,y),G{x,y),B{x,y)} in the 
{x, y, R, G, B} space, is given by 

<»«> ={r,R, + G,G, + B.B, 1 + RI + 4 + Bl )• 

where Rx = dxR- The edge indicator function is given hy q = det(g^). It is 
simple to show that 



q=l + J2 X 

i i—1 j—1 

where = R,v? = G, = B. Then, the edge indicator function g is given by 
a decreasing function of q, e.g., g = q~^. 
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3.2 Tracking objects in movies 

Let us explore two possibilities to track objects in movies. The first, considers the 
whole movie volume as a Riemannian space, as done in [7] . In this case the active 
contour becomes an active surface. The AOS scheme in the spatial-temporal 3D 
hybrid space is 

i 

where Ai(uq) is a matrix corresponding to derivatives along the Lth coordinate 
axis, where now I G [x,y,T^. 

The edge indicator function is again derived from the Beltrami framework, 
where for color movies we pull-back the metric 

/ 1 -l- -|- RxRy -f GxGy -|- BxBy R^Rt -f GxGj- B^Bj- \ 

(pp) ~ I RxRy GxGy -t- BxBy 1 -t- Ry -t- Gy -t- By RyR-f -t- GyGx + ByBj- I 

y RxR'] h GxG'] h BxBj- RyRj- -t- GyGj h ByBj^ 1 -t- R ^ — h G^ -t- j 

Which is the metric for a 3D volume in the 6D {x, y, T, R, G, B} spatial- 
temporal-spectral space. Again, setting q = det(g^), we have ^dxdydT as 
a volume element of the image. Intuitively, the larger q gets, the smaller spatial- 
temporal steps one should apply in order to cover the same volume. That is, 
q integrates the changes with respect to the x,y, and T coordinates, and can, 
thereby, be considered as an edge indicator. 

A different approach uses the contour location in frame n as an initial condi- 
tion for the 2D solution in frame n-l- 1, see e.g. [3,23]. The above edge indicator 
is still valid in this case. Note, that the aspect ratios between the time, the image 
space, and the intensity, should be determined according to the application. 

The first approach was found to yield accurate results in off line tracking 
analysis. While the second approach gives up some accuracy, that is achieved by 
temporal smoothing in the first approach, for efficiency in real time tracking. 

4 Experimental Results 

As a simple example, the proposed method can be used as a consistent, un- 
conditionally stable, and computationally efficient, numerical approximation for 
the curvature flow. The curvature flow, also known as curve shortening flow or 
geometric heat equation, is a well studied equation in the theory of curve evolu- 
tion. It is proven to bring every simple closed curve into a circular point in finite 
time [14,15]. Figure 1 shows an application of the proposed method for a curve 
evolving by its curvature and vanishes at a point. One can see how the number 
of iterations needed for the curve to converge to a point decreases as the time 
step is increased. 

We tested several implementations for the curvature flow. Figure 2 shows the 
CPU time it takes the explicit and implicit schemes to evolve a contour into a 
circular point. For the explicit scheme we tested both the narrow band and the 
naive approach in which every grid point is updated every iteration. The tests 
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step 10 step 30 step 58 



Fig. 1. Curvature flow by the proposed scheme. A non-convex curve vanishes in flnite 
time at a circular point by Grayson’s Theorem. The curve evolution is presented for 
two different time steps. Top: r = 20; bottom: t = 50. 



were performed on an Ultra SPARC 360MHz machine for a 256 x 256 resolution 
image. 

It should be noted that when the narrow band approach is used, the band 
width should be increased as the t grows to ensure that the curve does not 
escape the band in one iteration. 



Curvature Flow - CPU time 




Fig. 2. Curvature flow CPU time for the ex- 
plicit scheme and the implicit AOS scheme. 
First, the whole domain is updated, next, the 
narrow band is used to increase the efficiency, 
and finally the AOS speeds the whole process. 
For the explicit scheme the maximal time 
step that still maintains stability is choosen. 
For the AOS scheme, CPU times for several 
time steps are presented. 



Figures 3 and 4 show segmentation results for color movies with difficult 
spatial textures. The tracking is performed at two resolutions. At the lower 
resolution we search for temporal edges and at the higher resolution we search 
for strong spatial edges. The contour found in the coarse grid is used as the 
initial contour at the fine grid. 

There are some implementation considerations one should be aware of. For 
example, if we choose a relatively large time step, the active contour may skip 
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frame 2 frame 22 frame 50 



Fig. 3. Tracking a cat in a color movie by the proposed scheme. Top: Segmentation of 
the cat in a single frame. Bottom: Tracking the walking cat in the 50 frames sequence. 



over the boundary. The time step should thus be of similar order as the numerical 
support of the edges. One way to overcome this limit is to use a coarse to fine 
scales of boundary smoothing, with an appropriate time step for each scale. 

It is possible to compute the inverse matrices of the AOS once for the whole 
image, or to invert small sub-matrices as new points enter or exit the narrow 
band. There is obviously a trade-off between the two approaches. For initial- 
ization, we have chosen the first approach, since the initial curve starts at the 
frame of the image and has to travel over most of the image until it captures 
the moving objects. While for tracking of moving objects in a movie, we use the 
local approach, since now the curve has only to adjust itself to local changes. 

5 Concluding Remarks 

It was shown that an integration of advanced numerical techniques yield a com- 
putationally efficient algorithm that solves a geometric segmentation model. The 
numerical algorithm is consistent with the underlying continuous model. The 
proposed ‘fast geodesic active contour’ scheme was applied successfully for im- 
age segmentation and tracking in movie sequences and color images. It combines 
the narrow band level set method, with adaptive operator splitting, and the fast 
marching method. 
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step 2 step 30 step 48 step 70 




frame 2 frame 21 frame 38 frame 59 

Fig. 4. Tracking two people in a color movie. Top: curve evolution in a single frame. 
Bottom: tracking two walking people in a 60 frame movie. 
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Abstract. A method for deforming curves in a given image to a desired 
position in a second image is introduced in this paper. The algorithm is 
based on deforming the hrst image toward the second one via a partial 
differential equation, while tracking the deformation of the curves of in- 
terest in the first image with an additional, coupled, partial differential 
equation. The tracking is performed by projecting the velocities of the 
hrst equation into the second one. In contrast with previous PDE based 
approaches, both the images and the curves on the frames/slices of inter- 
est are used for tracking. The technique can be applied to object tracking 
and sequential segmentation. The topology of the deforming curve can 
change, without any special topology handling procedures added to the 
scheme. This permits for example the automatic tracking of scenes where, 
due to occlusions, the topology of the objects of interest changes from 
frame to frame. In addition, this work introduces the concept of project- 
ing velocities to obtain systems of coupled partial differential equations 
for image analysis applications. We show examples for object tracking 
and segmentation of electronic microscopy. We also briehy discuss pos- 
sible uses of this framework iifor three dimensional morphing. 



Key words: Partial differential equations, curve evolution, morphing, segmen- 
tation, tracking, topology. 

1 Introduction 

In a large number of applications, we can use information from one or more 
images to perform some operation on an additional image. Examples of this are 
given in Figure 1. On the top row we have two consecutive slices of a 3D image 
obtained from electronic microscopy. The image on the left has, superimposed, 
the contour of an object (a slice of a neuron). We can use this information to 
drive the segmentation of the next slice, the image on the right. On the bottom 
row we see two consecutive frames of a video sequence. The image on the left 
shows a marked object that we want to track. Once again, we can use the image 
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on the left to perform the tracking operation in the image on the right. These 
are the type of problems we address in this paper. 

Our approach is based on deforming the contours of interest from the first 
image toward the desired place in the second one. More specifically, we use a 
system of coupled Partial Differential Equations (PDE’s) to achieve this (coupled 
PDE’s have already been used in the past to address other image processing 
tasks, see [15,16] and references therein). The first partial differential equation 
deforms the first image, or features of it, toward the second one. The additional 
PDE is driven by the deformation velocity of the first one, and it deforms the 
curves of interest in the first image toward the desired position in the second 
one. This last deformation is implemented using the level-sets numerical scheme 
developed in [11], allowing for changes in the topology of the deforming curve. 
That is, if the objects of interest split or merge from the first image to the 
second one, these topology changes are automatically handled by the algorithm. 
This means that we will be able to track scenes with dynamic occlusions and to 
segment 3D medical data where the slices contain cuts with different topologies. 



2 Basic curve evolution 



Let C{p,t) : IR X [0,r) — >■ be a set of closed planar curves. Assume these 

curves deform “in time” according to 



dC{p,t) 

dt 



/3AT, 



( 1 ) 



where /3 is a given velocity and A/” the inner unit normal to C (p, t) . We should 
note that a tangential velocity can be added to the flow, although it will not 
affect the geometry of the deformation, just the internal parametrization of the 
curve C. Therefore, (1) gives the most general form of geometric deformations 
for planar curves. 

Let’s now assume that C{p,t) is the level-set of a given function u : x 

[0,t) — >■ M. Then, in order to represent the evolution of C by that of u, u must 
satisfy 

^=/3||Vu||, (2) 

where f3 is computed at the level-sets of u. This is the formulation introduced 
by Osher and Sethian [11] to implement curve evolution flows of the type of 
(1). This implementation has several advantages over a direct discretization of 
(1). Probably the main advantage is that changes in the topology of C{p,t) are 
automatically handled when evolving u, that is, there is no need for any spe- 
cial tracking of the topology of the level-sets; see [11] for details and [6,7] for 
theoretical analysis of this flow. The discretization of (2) is performed with an 
Eulerian approach (fixed coordinate system), as opposed to a Lagrangian ap- 
proach classically used to discretized (1), where marker particles are used. This 
gives a numerically stable digital-grid implementation. These reasons have mo- 
tivated the use of this formulation for a large number of applications, including 
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shape from shading, segmentation, mathematical morphology, stereo, and reg- 
ularization. Extensions of the level-sets algorithm to higher dimensions are of 
course straightforward. 



3 Morphing active contours 

Let Ii(a;,y, 0) : — >■ 5? be the current frame (or slice), where we have al- 

ready segmented the object of interest. The boundary of this object is given 
by Cxi(p, 0) : IR — IR^. Let l2{x,y) : IR^ IR he the image of the next 
frame, where we have to detect the new position of the object originally given 
by Cxi(_p, 0) in li(x,y,0). Let us define a continuous and Lipschitz function 
u{x,y, 0 ) : M, such that its zero level-set is the curve Cxi(p, 0). This 

function can be for example the signed distance function from Cxi(p, 0). Finally, 
let’s also define Ri{x,y, 0 ) : — >■ IR and T2{x,y) : IR?' — >■ iR to be images 

representing features of I\{x, y, 0) and l2{x, y) respectively (e.g., Ti = It, or Ti 
equals the edge maps of I*, i = 1,2). With these functions as initial conditions, 
we define the following system of coupled evolution equations {t stands for the 
marching variable): 



^^l(.X^yR^ /3/' II V7 'zr t II 

= P[x,y,t) II VRi{x,y,t) || 

du{x^y, t) ^ fi(^x,y,t) || Vu{x,y,t) || 

where the velocity P{x, y, t) is given by 



(3{x,y,t) 



B(t 7/ . ^u{x,y,t) 

’ ’ II WTx{x,yR) II II Vu{x,y,t) || 



( 3 ) 



( 4 ) 



The first equation of this system is the morphing equation, where (3{x,y,t) : 
W? X [0, r) — iR is a function measuring the ‘discrepancy’ between the se- 
lected features Ri{x,y,t) and R 2 {x,y). This equation is morphing Ti{x,y,t) 
into R 2 ix,y,t), so that (3{x,y,<x>) = 0. 

The second equation of this system is the tracking equation. The velocity in 
the second equation, [3, is just the velocity of the first one projected into the 
normal direction of the level-sets of u. Since tangential velocities do not affect 
the geometry of the evolution, both the level-sets of Ri and u are following 
exactly the same geometric flow. In other words, being A/'x'i and Afu the inner 
normals of the level-sets of iFi and u respectively,^ these level-sets are moving 
with velocities and $Afu respectively. Since /3A/’„ is just the projection 

of fiAf into Afu, both level sets follow the same geometric deformation. In 
particular, the zero level-set of u is following the deformation of Cx^, the curves 
of interest (detected boundaries in 2i{x, y,0)). It is important to note that since 



^ Recall that the normal to the level-sets is parallel to the gradient. 
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Cxi is not necessarily a level-set of Xi{x, y, 0) or y, 0), u is needed to track 

the deformation of this curve. 

Since the curves of interest in !Fi and the zero level-set of u have the same 
initial conditions and they move with the same geometric velocity, they will 
then deform in the same way. Therefore, when the morphing of J^i into has 
been completed, the zero level-set of u should be the curves of interest in the 
subsequent frame l 2 {x,y). 

One could argue that the steady state of (3) is not necessarily given by the 
condition /3 = 0, since it can also be achieved with || S/T\{x,y,t) ||= 0. This 
is correct, but it should not affect our tracking since we are assuming that the 
boundaries to track are not placed over regions where there is no information and 
then the gradient is flat. Therefore, for a certain band around our boundaries the 
evolution will only stop when /3 = 0, thus allowing for the tracking operation. 

4 Examples 

For the examples in this paper, we have opted for a very simple selection of the 
functions in the tracking system, namely 

T, = C{Ii), i=l,2, (5) 

and 

P{x,V,t) = T 2 {x,y)) - Ti{x,y,t), (6) 

where C{-) indicates a band around Cx^ ■ That is, for the evolving curve Cx^ we 
have an evolving band B of width w around it, and C{f{x,y,t)) = f{x,y,t) if 
(x, y) is in B, and it is zero otherwise. This particular morphing term is a local 
measure of the difference between I\{t) and X 2 - It works increasing the grey 
value of Xi{xq, yo: t) if it is smaller than X 2 {xq, yo), and decreasing it otherwise. 
Therefore, the steady state is obtained when both values are equal Vxo,yo in 
B, with ||VIi|| yf 0. Note that this is a local measure, and that no hypothesis 
concerning the shape of the object to be tracked has been made. Having no 
model of the boundaries to track, the algorithm becomes very flexible. Being 
so simple, the main drawback of this particular selection is that it requires an 
important degree of similarity among the images for the algorithm to track the 
curves of interest and not to detect spurious objects. If the set of curves Cx^ 
isolates an almost uniform interior from an almost uniform exterior as in Figure 
3, then there is no need for a high similarity among consecutive images. On the 
other hand, when working with images such as those in Figure 2, if Cxj (0) is too 
far away from the expected limit limt^ooCxi (t ) , then the abovementioned errors 
in the tracking procedure may occur. This similarity requirement concerns not 
only the shapes of the objects depicted in the image but especially their grey 
levels, since this l3 function measures grey-level differences. Therefore, histogram 
equalization is always performed as a pre-processing operation. 

We should also note that this particular selection of f3 involves information of 
the two present images. Better results are expected if information from additional 
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images in the sequence are taken into account to perform the morphing among 
these two. 

The first example of our tracking algorithm is presented in Figure 2. This 
figure shows nine consecutive slices of neural tissue obtained via electronic mi- 
croscopy (EM). The goal of the biologist is to obtain a three dimensional recon- 
struction of this neuron. As we observe from these examples, the EM images are 
very noisy, and the boundaries of the neuron are not easy to identify or to tell 
apart from other similar objects. Segmenting the neuron is then a difficult task. 
Before processing for segmentation, the images are regularized using anisotropic 
diffusion [1,2,14]. Active contours techniques as those in [5,8,9,10] will normally 
fail with this type of images. Since the variation between consecutive slices is not 
too large, we can use the segmentation obtained for the first slice (segmentation 
obtained either manually or with the technique described in [17]), to drive the 
segmentation of the next one, and then automatically proceed to find the seg- 
mentation in the following images. In this figure, the top left image shows the 
manual or semi-automatic segmentation superimposed, while the following ones 
show the boundaries found by our algorithm.^ Due to our particular choice of 
the (3 function, dissimilarities among the images cause the algorithm to mark as 
part of the boundary small objects which are too close to our object of interest. 
These can be removed by simple morphological operations. Cumulative errors 
might cause the algorithm to lose track of the boundaries after several slices, 
and re-initialization would be required. 

One could argue that we could also use the segmentation of the first frame 
to initialize the active contours techniques mentioned above for the next frame. 
We still encounter a number of difficulties with this approach: 1- The deforming 
curve gets attracted to local minima, and often fails to detect the neuron; 2- 
Those algorithms normally deform either inwards or outwards (mainly due to the 
presence of balloon-type forces) , while the boundary curve corresponding to the 
first image is in general neither inside nor outside the object in the second image. 
To solve this, more elaborated techniques, e.g., [13], have to be used. Therefore, 
even if the image is not noisy, special techniques need to be developed and 
implemented to direct different points of the curve toward different directions. 

Figure 3 shows an example of object tracking. The top left image has, su- 
perimposed, the contours of the objects to track. The following images show the 
contours found by our algorithm. For sake of space, only one every three frames is 
shown. Notice how topological changes are handled automatically. A pioneering 
topology independent algorithm for tracking in video sequences, based on the 
general geodesic framework introduced in [5,9] can be found in [12] (an extension 
to this, with a number of key novel features, was recently reported in [13]). In 
contrast with our approach, that scheme is based on a unique PDE (no morphing 
flow) , deforming the curve toward a (local) geodesic curve, and it is very sensible 
to spatial and temporal noisy gradients. We should also not that although the 
authors of [12] propose a fast technique to implement their flow, this technique 

^ A preliminary version of this algorithm has been compared with the segmentation 
component of [4] and found to produce better results. 
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is not actually implementing their proposed algorithm. Therefore, to fully im- 
plement their scheme, other much slower techniques need to be applied. Due to 
the similarity between frames, our algorithm converges very fast. Both [12,13] 
use much more elaborated models to track, and testing on some of the same 
sequences (e.g., the highway and two-man-walking sequences), we found that a 
much simpler algorithm as the one here proposed already achieves satisfactory 
results. The elaborated models in their work might be needed for more difficult 
scenes than the ones reported in this paper. The CONDENSATION algorithm 
described in [3] can also achieve, in theory, topology-free tracking, though to 
the best of our knowledge real examples showing this capability have not been 
yet reported. In addition, this algorithm requires having a model of the object 
to track and a model of the possible deformations, even for simple and useful 
examples as the ones shown in this paper (note that the algorithm here proposed 
requires no previous or learned information). On the other hand, the outstand- 
ing tracking capabilities for cluttered scenes shown with the CONDENSATION 
scheme can not be obtained with the simple selections for Ti and (3 used for the 
examples in this paper, and more advanced selections must be investigated. 

Additional tracking examples are given in the next three figures. 



5 Concluding remarks 

In this paper we have presented a system of coupled PDE’s developed for image 
segmentation and tracking. We are also investigating the use of this technique 
for 3D, topology independent, morphing, and Figure 7 shows a toy example 
to illustrate this. There are a number of additional directions to continue the 
framework described in this paper, we discuss some of them now. 

It is of course of great importance to develop more robust selections of the 
feature map Ti and the discrepancy function /?. One possible direction is to use 
recent image metrics based on steerable (wavelets) decompositions. This is the 
subject of current research. 

The use of singular value decomposition and principal components analysis 
became very popular in computer vision and image processing in the past years. 
The basic idea is to represent a given event as a linear combination of principal 
components from learned events. We can see the technique here described as 
a first step toward the deformation of principal components. That is, we can 
look at the curve obtained from the current slice as a principal component. We 
are currently investigating the extension of this technique to the deformation 
of a number of principal components, thereby representing a given event as a 
combination of deformed learned principal components. The deformations will 
be obtained as a system of coupled PDE’s. 

The equations introduced in this paper are basically “short in memory,” that 
is, only the present frame is used to segment the next one. We can incorporate 
past information to these equations, in the form of optical flow or Kalman filter- 
ing (or the techniques in the novel scheme developed in [3]), in order to improve 
the detection results. Some modeling of the object of interest could be intro- 
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duced in the morphing function /3 as well. This will be the subject of further 
study. 

We are also studying the extension of previous theoretical results for the 
system of coupled PDE’s presented in this paper. The equations introduced in 
this paper are one example of systems of coupled PDE’s where the velocity in 
the second equation is obtained by projecting the corresponding velocity in the 
first flow. It turns out that this technique has applications in other areas like 
denoising of vector-valued images and surface mapping. 
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Fig. 2. Nine consecutive slices of neural tissue. The first image has been segmented 
manually. The segmentation over the seguence has been performed using the algorithm 
described in this paper. 
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Fig. 3. Nine frames of a movie. The first image has been segmented manually. The 
segmentation over the sequence has been performed using the algorithm described in 
this paper. Notice the automatic handling of topology changes. 
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Fig. 4. Tracking example on the “Walking swedes” movie. 
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Fig. 5. Tracking example on the “Highway” movie. 




Fig. 6. Tracking example on the “Heart” movie. 
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Fig. 7. Eight steps of 3D morphing, from a given volume (top left) to eight given cubes 
(bottom right). This toy example uses the algorithm described in the text. 
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Abstract. Level set methods provide a robust way to implement ge- 
ometric flows, but they suffer from two problems which are relevant 
when using smoothing flows to unfold the cortex: the lack of point- 
correspondence between scales and the inability to implement tangential 
velocities. In this paper, we suggest to solve these problems by driving 
the nodes of a mesh with an ordinary differential equation. We state that 
this approach does not suffer from the known problems of Lagrangian 
methods since all geometrical properties are computed on the fixed (Eu- 
lerian) grid. Additionally, tangential velocities can be given to the nodes, 
allowing the mesh to follow general evolution equations, which could be 
crucial to achieving the final goal of minimizing local metric distortions. 
To experiment with this approach, we derive area and volume preserv- 
ing mean curvature flows and use them to unfold surfaces extracted from 
MRI data of the human brain. 



1 Introduction 

Neural activity in high-level tasks of the brain takes place mainly in the cortex, 
which in humans is a highly folded surface with more than half of its area hid- 
den inside sulci [22,28,29]. Regions of neural activity which are close together 
in three-dimensional space may therefore be far apart when following the short- 
est path connecting them on the cortical surface. This suggests that a surface 
representation is better suited than a volumetric one for the task of functional 
analysis [8,12,28]. 

Once such a representation is available, it may be necessary to “unfold” the 
surface in order to improve visualization and analysis of the neural activity. 
Presently, this is done by representing the surface as a triangulated mesh which 
is forced to move depending on the gradient of a discrete energy measure [8,28]. 

This is a geometric Lagrangian formulation which can be exchanged for an 
Eulerian one, viewing the problem as a front propagation driven by a PDE 
which is solved on a fixed grid. This so-called “level set formulation” was initially 
proposed by Osher and Sethian in [23] and has been extensively applied to plane 
curve evolutions [3,9,19,24] and, to a lesser extent, to the evolution of closed 
surfaces [4,6,14,18]. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 58—69, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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Replacing the discrete-energy minimization approach by a surface evolution 
is interesting since formal results concerning existence, uniqueness, stability and 
correctness of the evolution may be established using results in the theory of 
PDE’s. When implementing the evolution, the Eulerian approach provides two 
primary advantages. First, it is more numerically stable since the computations 
are performed on a fixed grid, unlike the Lagrangian approach, in which heuristic 
regriding procedures are necessary to avoid numerical explosions [23] . The fixed- 
grid approach has also been shown to regularize originally ill-posed problems 
in [17]. The second advantage is the ability to handle topological changes. This 
is useful even when the topology of the initial and final curve/surface are the 
same, since this ability may be required at an early stage in the evolution to 
escape blocking configurations (see for example the discussion on min-max flow 
in [27]). 

On the other hand, at least three questions which are relevant to our goal 
arise when migrating to a level set approach and here we suggest an answer to 
the second and third of these questions. The first one is that when unfolding 
the cortex, topological changes are not desirable. This brings up the problem 
of finding a surface evolution which is topology-preserving. For planar curves, 
such an evolution is given by the curvature flow, but unfortunately this is not the 
case for surfaces. Much research has been devoted to this problem, but it remains 
an open one [21]. The second question is that of achieving point correspondence 
between surfaces at different scales. In the level set approach this correspondence 
is lacking since the surface is only implicitly defined. This gives rise also to the 
third problem which is that with the level set approach, only flows that do not 
contain tangential velocities can be implemented. Tangential velocities do not 
affect the geometry of the surface, but they may be important in our application 
since they do affect extrinsic functions defined on the surface. 

Although very closely related, the last two problems are not exactly the same. 
In [I] the authors propose a solution to the correspondence problem by tracking 
region boundaries. Their solution however, does not allow tangential terms to 
be implemented. 

We suggest to solve this problems by mapping the function of interest on 
the nodes of a mesh and subsequently tracking these nodes by means of their 
corresponding differential equation. The tracking of the mesh solves the corre- 
spondence problem and, at the same time, tangential velocities are applicable to 
the mesh nodes. Although it may seem that this approach brings back the prob- 
lems of Lagrangian formulations, this is not the case since the mesh is passively 
driven, all the geometric quantities relevant to the evolution being computed 
on the fixed (Eulerian) grid. The proposed approach is described in detail in 
Section 4. 

In Section 2, we derive area and volume preserving mean curvature flows, 
which are the three-dimensional extensions of the Euclidean flows presented 
in [26] . Although the obtained flows are not Euclidean invariant and may develop 
topological changes, they allow to evaluate the tracking approach by smoothing 
the surface without shrinkage and have yielded reasonable results in practice. 
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Section 5 provides experimental results on their use to unfold the human cortex 
while tracking the initial triangulated representation. Conclusions and future 
research directions are discussed in Section 6. 

2 Normalized 3D Mean Curvature Flows 

In this section we present the evolution equations for mean curvature flows with 
constant total area or enclosed volume. These are direct three-dimensional ex- 
tensions of the Euclidean flows described by Sapiro and Tannenbaum [26] for 
planar curves, and have also been studied in [2,11,25]. In the following discus- 
sion, bold letters will represent 3D vector quantities, the integral symbol will 
always denote a closed surface integral over the surface and the scalar and cross 
product between two vectors vl and v2 will be denoted ul • v2 and ul x v2 
respectively. Subscripts will denote partial differentiation with respect to the 
subscripted parameter. 

We consider the family of orientable surfaces in K.^ denoted S{u,v,t), where 
u and V parameterize each surface and t parameterizes time (scale), which is 
obtained by the time evolution of an initial surface So(u, v) = S(u, u, 0) governed 
by the following PDE: 



St = HN (1) 

where H{u,v) is the mean curvature and N(u,u) is the unit inward normal 
vector. This evolution is known as the mean curvature flow and its properties 
have been extensively studied in the past [5,7,15,16,20]. 

The key idea to obtain a normalized flow is to apply a scaling to the space at 
each instant during the evolution. The scaling factor will be denoted Let 
S be the image of S under this scaling: 

m = mm m 

Initially, = 1 and the two surfaces coincide. As time evolves, S describes 
another family of surfaces which adopts the same shapes as S, since scaling is a 
similarity transformation. For the same reason, all the geometric properties of S 
can be inferred from those of S. The function can be chosen such that the 
volume of S remains constant: 



V = m = Vo (3) 

or such that the total area is preserved: 

A = iIAA = Ao (4) 

By performing a change of temporal variable, from t to r(f) such that ^ = 
and taking into account the relations H = ijjH and N = N, the evolution 
of S may be written as: 

Sr = ^St = HN + (5) 

dr dt 
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The value of the second term of equation (5) will depend on which quantity 
we wish to preserve. From (4) we obtain the area preserving value, 



dtp 

dt 






1 dA 

2Aolit 



and from (3) the volume preserving one: 



df/' ,_3 V' d,V 
dt 3Vq dt 



(6) 

( 7 ) 



We see that in order to achieve constant area or volume, we need to determine 
the evolutions of these quantities under the flow. For a surface evolving as 



St = /3N 



( 8 ) 



the volume variation (see e.g. [2,11]) is given by the closed integral of the speed: 

(L = (9) 

To compute the evolution of the area, we show in the appendix that the 
evolution of the vector S„ x S^, can be written as 



(S„xS,)i = |S„xS„| (2PHN-V(3) (10) 

where V/3 is the vector on the tangent plane representing the gradient of the 
function fd. Using the definition of the area element, 

da = \Su x Sy \ du dv (11) 

the evolution of the area can be obtained from (10): 

f = (12) 

Interestingly, equation (10) can be used to prove the following proposition. 



Proposition Let /3 : S — >■ M 6e o differentiable function defined on a closed 
regular surface S C K.^. Then the following equality holds: 



/ 



(3 da 



1 

3 



/3 - S • V/3 + 2 /3id (S • N) ) dcr 



Proof The volume enclosed by the surface is given by^ 

1 



V = 



S-N da 



(13) 



^ The divergence theorem relates the volume integral of the divergence of a vector A 
to a surface integral over the surface bounding the volume as 



[ V-Adv= [ 
Jv J s 



A-Ndcr 



The fact that V • S = 3 implies relation (13). 
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Using the definition of the normal vector N = ( S„ x St, )/|Su x St, |, its evolution 
can be obtained from (10): N( = — V/3. The volume variation is given by the 
time-derivative of (13): 

^ = [P-S- V(3 + 2 (3H (S-N)) da (14) 

which completes the proof by identifying with (9). □ 



Taking into account the relations H = ipH and da = ip~'^da, the area and 
volume variations under mean curvature flow may be computed on S, allowing 
us to write the corresponding area and volume preserving flows by substitution 
in (5) of (6) and (7) respectively: 



Sr 




S-N 

^0 




N 



(15) 



(16) 

Note that the fiows are geometrically intrinsic to S. Also note that we have taken 
only the normal component of the second term in the equations since only this 
term affects the geometry of the surface [26]. It is also interesting to note that, 
unlike the 2D case, the volume preserving flow is not local. 



3 Level Set Formulation 



We proceed to describe the computed fiows under the level-set approach. A more 
formal analysis may be found in [13,23]. The surface is represented in an implicit 
form, as the zero level-set of a function u(X,t): 

So{u, u) = {X G 0) = 0} (17) 

If the surface is evolving according to 

then 

S(u, V, t) = {X G : u(X, t) = 0} Vt (19) 

provided that the function u(X, t) : x R — >■ R evolves according to 

^ = /3|Vu| (20) 



Intrinsic geometric properties of the surface have implicit expressions on u. For 
example the unit inward normal vector and the mean curvature are given by 



N = - 



Vu 

IW 



and 



= div 



Vu 



(21) 
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Actually the above values give the normal vector and mean curvature of the 
iso-level of t6 at X. 

Finally, since the evolution equations are not local, the integrals must be 
approximated by extracting the corresponding integrands with a marching cubes 
technique. 



4 Maintaining point correspondences at different scales 

In this section we describe the tracking of the initial mesh of the surface, which 
contains information that is to be kept during the evolution. Formally, let 

/(X) :S^R (22) 

be a function on the surface, sampled at a finite number of points 

{X, G A : /(Xi) = fi} (23) 

Since the surface is evolving as St = /3N, each of the points moves according to 

the following differential equation: 

Its trajectory can be followed by updating its position as 

X,(t + Z\t)«X,(t)-/3At ^(Xd (25) 

|Vu| 

at each step of the PDF. Note that all computations are performed on the 
function u and therefore no harm is done by the nodes getting too close or too 
far from each other. Small systematic errors due to the approximation may be 
corrected at every iteration by projecting the points on the zero level set of u: 

Xi{t + At)^ ^Xi(t -I- At) - |^^|^^(Xi(t -I- Z\t))^ (26) 

This projection can also be used when given tangential velocities to the nodes, 
in order to force them to stay on the zero level set. 

Topological changes may be handled automatically by re-sampling the func- 
tion on the new triangulation extracted from the level set at each step. This can 
be done in the following way. Let 



y = {Y, G 5 : u{Yj) = 0} (27) 

be the set of nodes of the new mesh, which is extracted from u by a marching 
cubes technique. The function / can be remapped on Y by assigning to each Y j 
the linear interpolation of fk, fi and fm, where the three nodes Xj,, X; and X^ 
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are such that Y j is inside the triangle of the tangent plane defined by the three 
points: 

r Pi = Xfe - uVu / |Vu|2 (Xfc) 

P 2 = X, - uYu / |Vu|2 (Xz) (28) 

[ P 3 = X^ - uVu / |Vu|2 (X„) 

To find such a triangle, a search is necessary among the closest triangles to Y j 

and therefore this procedure is computationally expensive. For this reason, and 
to make more obvious where undesirable topological changes occur, we do not 
perform this step in the experiments. 



5 Results 

Here we describe the results obtained by applying the normalized mean curvature 
flows together with the tracking framework described in the previous section, to 
unfold surfaces extracted from pre-segmented MRI data of the human brain. 
The tracked function in the examples is the sign of the mean curvature, light 
regions indicating concave folds. 

Fig. 1 shows a first example starting with a reduced and slightly smoothed 
version of the cortex. This surface was obtained by applying a scaling flow: 

St = -(S-N)N (29) 

to the surface in order to reduce its size, followed by a few steps of Mean Cur- 
vature Flow (MCF). The columns correspond to three different views. The first 
row shows the initial surface. It can be observed that the relative areas of light 
and dark regions are approximately the same. This qualitative evaluation may 
already be useful in discarding fiows that obviously change this balance. This is 
the case for the area-preserving flow, whose results are shown in the second row 
of Fig. 1. It is clear that the dark regions become too wide while the light regions 
grow too thin. The balance between light and dark regions is not preserved at 
all. This is undesirable since the goal of the unfolding is to improve visibility 
in the light regions, i.e. the sulci. The third row is the result obtained with the 
volume-preserving flow. Here the proportion of dark and light regions is better 
preserved. The fourth row is the result obtained by applying MCF alone. The 
proportions are again qualitatively well preserved. Quantitative measures are 
required to evaluate more precisely these results. 

The second example (Fig. 2) shows results with the original cortical surface 
extracted from the MRI data (i.e. no preprocessing was applied as in the previ- 
ous example). In this case, only mean curvature flow and its volume-preserving 
version were tested. The first row shows the initial geometry of the cortex, while 
the second row presents the geometry as obtained by applying volume-preserving 
MCF. In the third row, the sign-of-curvature function has been mapped on this 
same surface. The last row shows the result of applying MCF alone. In this ex- 
ample, the results are very similar with respect to the distribution of the tracked 
regions. 
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Fig. 2. Unfolding the cortex as segmented from the MRI image. The first two rows show 
the geometry of the initial and final surfaces, without mapping the sign-of-curvature 
function. In the third row, the function is mapped using volume-preserving MGF, while 
the fourth row shows the result of applying MGF alone. In this last row the zoom is 
larger in order to better visualize the sign-of-curvature function. 
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6 Conclusion 

We have presented normalized mean curvature flows together with a tracking 
framework that allows to maintain the knowledge of an extrinsic function de- 
fined on the surface. These flows were used as first attempts to solve the problem 
of unfolding the cortex using level set techniques, and have indeed yielded en- 
couraging results. Nevertheless, further research is needed in order to obtain 
a front propagation model that takes into account the physical constraints of 
the problem, i.e. minimum variation of geodesic distances and no topological 
changes. By allowing tangential movements of the tracked nodes, our approach 
makes general propagation models (i.e. containing normal as well as tangential 
terms) applicable to those nodes. 

Appendix 

Here we show how to obtain equation (10), which gives the evolution of S„ x S^,. 
Direct differentiation with respect to time gives: 



(S„ X S„)t = (/3„N + /3N„) X S„ + S„ X (/3„N + /3N„) (30) 

N T 

, V / s 

= (/3(N„ X S„ + S, X N„)) - ( /3„S„ X N + /3„N X S„ ) (31) 

The first term is normal since N„, N^,, S„ and S„ are all four tangential. 
Moreover, using the fact ([10]) that N„ and N„ are decomposed in the tangent 
plane as: 





— aiiS„ -|- ai2Sy 
N^, = 02 iS„ -l- 02281 , 


(32) 


with 


On -l- 022 = 2 iL 


(33) 


We have 


AT = |S„ X S„| 2/3iL N 


(34) 



The second term is obviously tangential and actually gives the gradient of 
[3 in the tangent plane. To see this, we will show that its scalar product with 
an arbitrary vector v of the tangent plane is proportional to the directional 
derivative of f3 in the direction of v, which is the definition of a gradient operator. 
Let V be expressed as 



V = Su + a2 S„ 



We have 

T - V = aij3u S„ • (S„ X N) -I- a2f3v S„ • (N x S^) 



(35) 

(36) 
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but 

S„-(S„xN) = S,-(NxS„) = |S„xS„| (37) 

so that 

— —T ■ V = aiPu + a2(3v (38) 

I ^ I 

The right-hand side of equation (38) is the directional derivative of (3 in the 
direction of v. We therefore may write 

r=|S„xS,|V/3 (39) 

Combining equations (31), (34) and (39) gives equation (10): 

(S„xS,)t = |S„xS„| (2/377N-V/3) (40) 
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Abstract. This paper is concerned with the simulation of the Par- 
tial Differential Equation (PDE) driven evolution of a closed surface 
by means of an implicit representation. In most applications, the natu- 
ral choice for the implicit representation is the signed distance function 
to the closed surface. Osher and Sethian propose to evolve the distance 
function with a Hamilton-Jacobi equation. Unfortunately the solution to 
this equation is not a distance function. As a consequence, the practi- 
cal application of the level set method is plagued with such questions 
as when do we have to ’’reinitialize” the distance function? How do we 
’’reinitialize” the distance function? Etc... which reveal a disagreement 
between the theory and its implementation. This paper proposes an al- 
ternative to the use of Hamilton-Jacobi equations which eliminates this 
contradiction: in our method the implicit representation always remains 
a distance function by construction, and the implementation does not 
differ from the theory anymore. This is achieved through the introduc- 
tion of a new equation. Besides its theoretical advantages, the proposed 
method also has several practical advantages which we demonstrate in 
two applications: (i) the segmentation of the human cortex surfaces from 
MRI images using two coupled surfaces [26], (ii) the construction of a 
hierarchy of Euclidean skeletons of a 3D surface. 



1 Introduction and previous work 

We consider a family of hypersurfaces S{p,t) in K.^, where p parameterizes the 
surface and t is the time, that evolve according to the following PDE: 

f = ^ « 

with initial conditions 5(t = 0) = 5o, where Af is the inward unit normal 
vector of 5, /3 is a velocity function and Sq is some initial closed surface. 

Methods of curves evolution for segmentation, tracking and registration were 
introduced in computer vision by Kass, Witkin and Terzopoulos [15]. These evo- 
lutions were reformulated by Caselles, Kimmel and Sapiro [7] and by Kichenas- 
samy et al. [16] in the context of PDE-driven curves and surfaces. There is an 
extensive literature that addresses the theoretical aspects of these PDE’s and 
offers geometrical interpretations as well as results of uniqueness and existence 
[13,14,9]. 

Level set methods were first introduced by Osher and Sethian in [21] in the 
context of fluid mechanics and provide both a nice theoretical framework and 
efficient practical tools for solving such PDE’s. In those methods, the evolution 
(1) is achieved by means of an implicit representation of the surface S. 

The key idea in Osher and Sethian’s approach is to introduce a function 
u : X R — >■ M such that 
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u{S, t) = 0 Vt (2) 

By differentiation (and along with A/” = — and (1)), we obtain the Hamilton- 
Jacobi ^ equation: 

^ = /3|V«| (3) 

with initial conditions u{-, 0) = Uo(.)> where Uq is some initial function — >■ 

R such that ^ 0 ( 50 ) = 0. It has been proved that for a large class of functions u 
and uo, the zero level set at time t of the solution of (3) is the solution at time 
t of (1). 

Regarding the function uq, it is most often chosen to be the signed distance 
function to the closed surface iSq. This particular implicit function can be char- 
acterized by the two equations: 

{x G = 0} = iSo and |Vmo| = 1 

Indeed, the magnitude of the gradient of uq is equal to the magnitude of the 
derivative of the distance function from 5o in the direction normal to Sq, i.e., it 
is equal to 1. 

It is known from [5] that the solution u of (3) is not the signed distance 
function to the solution 5 of (1). This causes several problems which are analyzed 
in the following section. 

It is also important to notice that /3 in (3) is defined in R^ whereas in (1) it 
is defined on the surface S. The extension of (3 from S to the whole domain R^ 
is a crucial point for the analysis and implementation of (3). There are mainly 
two ways of doing this. 

(i) Most of the time this extension is natural. For example, if /? = Hs, the 
mean curvature of S in (1), one can choose (3 = Hu, the mean curvature of the 
level set of u passing though x in (3). 

(ii) In some cases [24,20,2], this extension is not possible. Then one may 
assign to P{x) in (3) the value of f3{y) in (1) where y is the closest point to x 
belonging to S. The problem with this extension is that it hides an important 
dependence of j3 in (3) with respect to u and we show in section 4 that in this 
case (3) is not a Hamilton- Jacobi equation. 

The thrust of this paper is a reformulation of the level set methods introduced 
by Osher and Sethian in [21] to eliminate some of the problems that are attached 
to it, e.g. the need to reinitialize periodically the distance function or the need 
to “invent” a velocity field away from the evolving front or zero level set. The 
implications of our work are both theoretical and practical. 

2 Why Hamilton- Jacobi equation (3) does not preserve 
distance functions. 

In this section, we suppose that /? is extended as explained in (i). The fact that 
the solutions to Hamilton- Jacobi equations of the form (3) are not distance func- 
tions has been proved formally in [5] . A convincing geometrical interpretation of 
this fact is now given through two short examples. 



^ The difference between a Hamilton-Jacobi equation and a general first order PDF 
is that the unknown function (here u) does not appear explicitly in the equation. 
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2.1 First example 



Let us consider the problem of segmenting a known object (an ellipse) in an 
image by minimizing the energy of a curve [8] . Let us force the initial curve to 
be exactly the solution (the known ellipse) and initialize uq to the signed distance 
function to this ellipse, then evolve u with the Hamilton- Jacobi equation (3). 

It is obvious that the zero level set of u (let us call Sq this ellipse) will not 
evolve, since it is the solution to (1) and (3{x G Sq) = 0. 

Notice however that replacing 0 by e G ffi. in (2) implies by differentiation the 
same equation (3), which means that the e level set of u (let us call this 
curve) also evolves according to ^ = pM. In consequence, P{x G S^) yf 0 and 
iSe evolves toward 5o in order to minimize its energy (cf. fig. (1)). 




Fig. 1. All the level sets of u (shown as single curves) move towards the ellipse So in 
order to minimize their own energy with the effect that the distance function is not 
preserved. 

This shows that the shock wave equation (3) requires that all the level sets of 
u should converge to the ellipse 5o and therefore that |Vu| increases dangerously. 



2.2 Second example 



A point M with coordinate x G K. and energy E{x) = ^ is moving along the 
real line in order to minimize its energy. We force the point M to be at xq yf 0 
at t = 0. The level set version of this problem is to define uq on the real line as 
uo{x) = X — xo and to evolve u with the Hamilton- Jacobi equation ^ 

The solution is u{x,t) = e‘x — xq. The figure (2) shows m at 3 time instants 
(0 = to < ^1 < ^ 2 )- The zero level set of u is indeed traveling to the origin O but 
the slope of u is fy = e* and increases exponentially in time. 

The second example is a rephrasing of what happens in the normal direction 
to the evolving curve in the first example. It is now obvious why driving all 
the level sets of u with (3) cannot conserve distance functions and in addition 
leads to unbounded values of |Vw|. In practical applications, one is compelled to 
“reinitialize” the implicit function m to be a distance function which is obviously a 
contradiction and which shows a gap between the theory and its real application. 

In the next section, we convince the reader that maintaining m as a distance 
function (i.e. such that | Vw| = I) during all the time of the evolution is definitely 
desirable, sometimes crucial. 
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u(x,t) 1 



0 



M(t2)/M(ti) M(to) 



Fig. 2. The point M moves on the horizontal line in order to minimize its energy 
E{x) = The function u, initially of slope 1, becomes more and more vertical. 



3 Why we should preserve the distance function. 

There are at least two reasons for preserving the signed distance function to the 
evolving surface, a theoretical one and a practical one. 

(i) From the theoretical viewpoint, the implicit description of S (seen as a 
subset of K.^) and its signed distance function u are equivalent descriptions. 
Indeed, given any surface S, its signed distance function is uniquely defined. 
Conversely, any implicit function u satisfying |Vm| = 1 is the signed distance 
function to a surface plus a constant (this last constant is taken equal to 0 
on the surface) [4]. Since these descriptions are equivalent, one can transpose 
immediately properties of the first one into properties of the second one and vice 
versa. For example, u has converged if and only if S has converged (which 
is not true with Hamilton- Jacobi equation (3) according to the last section). 

Moreover, one can deduce interesting intrinsic properties of 5 by a local 
knowledge of u. In [3], it is proved that the second fundamental form of S can 
be computed using the derivatives of the squared distance function. In addition, 
some applications in medical image analysis such as the segmentation of the 
cortex using two coupled surfaces [26] assume that the distance between the 
surfaces is known at any time. As a last example, the computation of the skeleton 
of a surface requires the detection of the singularities of its distance function [18]. 

(ii) From the practical viewpoint, the numerical approximation of the deriva- 
tives of u by finite differences requires the choice of a spatial step dx. One chooses 
a small dx if the slope (the gradient) of the function is large and a larger dx if 
the function has small variations. Since level sets are most often implemented 
on regular grids, it is more efficient to use the same step dx = 1 for each grid 
point. It is obvious that this approximation is more accurate if the norm of the 
gradient of u is known which is the case with distance functions since [Vu] = 1. 
Keeping |Vu] bounded assures that the derivatives of u are always computable 
without the need to “reinitialize” u. 

We now describe a new approach that preserves the signed distance function 
and therefore meets these two requirements. 
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4 How to preserve the signed distance function. 



In this section, we suppose that uq = u{.,0) is initialized at t = 0 as the signed 
distance function to the initial surface Sq. 

The basic idea is to change equation (3) in such a way that at each time 
instant u is the signed distance function to the solution S of (1). In order to 
achieve this goal, we look for a function B x R+ — >■ R such that ^ = B 
and which satisfies the two constrains: (i) x — >■ u(x , .) is a distance function, (ii) 
the zero level set of u evolves according to (1). 

We express these constrains with the system of equations: 






du _ 
|Vm| = 1 



(4) 

(5) 

(6) 



where -B|„=o denotes the restriction of B to the zero level set of u. By differ- 
entiating (5) and (6), we obtain: 



V 




= V-B and 






dVu 



= 0 



(7) 



using the Schwartz equality = V (f|), we get: 



Vm • VB = 0 



( 8 ) 



which, together with (4) and (5) determines the function B. Relation (8) states 
that the function B does not vary along the characteristics of u (the character- 
istics of u are the integral curves of Vu). It also means that the characteristics 
of u and B are orthogonal. 

In order to go one step further in the resolution of the system, we must recall 
an important property [4]: the characteristics of distance functions are 
straight lines (cf. fig. (3)). 




tt = 0 
u = cst 
B — cst 



Fig. 3. Characteristic curves of the field Vu. 

This implies that B is constant along straight lines. These lines (or rays) 
intersect the zero level set of tt at a point where B is known according to (4) . 

Given any point a; G R^, an equation of the characteristic of u passing through 
X is A — >■ a; — AVu. Since the distance of x to the zero level is u{x) and | Vtt(x) | = 
1, the point y = x — uVu is on the zero level set of u. Notice that y is the 
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closest point to x such that u{y) = 0. According to the last reasoning, we have 
B{x) = B{y) = f3{x — uVu). Therefore, the solution to the initial system is: 

f)'ll 

^ = /3(x - uVu) (9) 

with initial condition ^(.,0) = uo(-)- This equation^ is the main result of the 
paper. Note that equation (9) is not a Hamilton- Jacobi equation since u appears 
in the right-hand side and plays a major role. An interpretation of (9) is the 
following: the zero level set of u is driven by ^ = /J as proposed by Osher 
and Sethian. The evolution of this particular surface geometrically defines (by 
propagation) the evolution of all other level sets. 

Remark: a posteriori, one guesses that the integral version of equation (9) 
is the equation u{S + \J\f) = A Vt, A. This can be proved by differentiation 
with respect to t and A. It states that the surface parallel to S at distance A 
from S should be the A level set of u. This is to be compared to the constrain 
u{S, t) = 0 Vt introduced by Osher and Sethian. 



The uniqueness of the closest point y to x such that u{y) = 0 is only guar- 
anteed if Vu{x) exists. The set of points of where Vu is not defined is called 
the skeleton of S (cf. fig. (4)). ^ ^ o 




Fig. 4. The skeleton of the zero level set is determined by the points where Vm is not 
defined. 

Skeletons are very important in computer vision [6,17,22]. Since it turns out 
that they are a byproduct of our new proposed evolution, we describe in the 
next section an implementation of equation (9) in which special care is taken of 
the computation of the skeleton. 



^ Equation (9) looks simple but is not. Consider for example the case of mean cur- 
vature flow: (9) writes ^{x,t) = div{Vu{x — u(x,t)Vu(x,t),t)), which is not a 
PDE (Indeed, two different points in R" x are considered, namely {x, t) and 
{x — uVu,t)). However, notice that u(x — uVu) = 0, Vu{x — uVu) = Vu{x), and 
according to [3], the second fundamental form at a; — mVm can be computed using 
the derivatives of u{x, t) up to the third order. This shows that for a large class of 
velocity functions (in particular for mean-curvature flow), (9) is indeed a PDE. 
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5 Implementation 

In this section, we propose a straightforward implementation of the previous 
theory, u is initialized as the signed distance function to the initial surface. We 
fix M at a particular instant t and compute the real field B{x, t) = j3{x—uS/u) on a 
narrow band [10,19,1] of S. Once B is known, u can be updated by u(x, t + dt) = 
u{x,t) + B{x,t)dt. The computation of B is done in two steps corresponding 
respectively to equations (4) and (8). The difficulty is that we work on a discrete 
grid and this can have dramatic consequences if proper care is not taken of the 
sampling effects. 

In order to deal with those effects, we introduce some notations. Points of 
such that none of their coordinates is an integer will be denoted by lower case 
letters, e.g. x, and called real points. Points of where N is the set of integers, 
will be denoted by upper case letters, e.g. X, and called voxels. We can think of 
X as a point falling in a cube formed by eight voxels. We note V (x) this set of 
eight voxels. 

If / is a function defined on and x is a real point such that the values 
of / are known at all voxels of V(x), we note fi(x) the value of the trilinear 
interpolation at x. In detail, if x = (xi,X 2 ,X 3 ) = (ni + Ci,n 2 + £ 2 , ns + £ 3 ), 
where Ui € N and 0 < Ci < 1, then we have by a simple linear interpolation 
//(xi,X 2 ,X 3 ) = (l-ei)/(ni,X 2 ,X 3 ) + ei/(ni + l,X 2 ,X 3 ). By applying recursively 
this rule to f{ni,X 2 ,X 3 ) and /(ni + 1 ,X 2 ,X 3 ), one expresses //(x) as a linear 
combination of the samples of / at the voxels of V{x), the weights being third 
order polynomials of the coordinates (ci, £ 2 , £3). 

Let A{X) be the 26-neighborhood of the voxel X. Since generically the zero 
level set of u is composed of real points, we need to determine when a voxel X is 
adjacent to this zero level set. Consider the function Cu defined on the voxels of 
the grid such that C„(X) = 0 if u{X) > 0 and C„(X) = 1 if u{X) < 0. A 
voxel X is said to be adjacent to the zero level set of u if 3Y G A{X), C'„(F) yf 
Cu{X). We call Z the set of voxels adjacent to the zero level set of u. We are 
now in position to describe the two steps of our computation. 



5.1 First step: computation of j3 on Z 

The first step is the computation of (3 on Z. These values are stored in a tem- 
porary buffer called B^ . There are two ways to do this. If [3 is defined on 
then one can assign B^{X) = f3{X) MX G Z. If /3 is only defined on 
the nodes of a mesh describing the zero level set of u, then one can assign 
B^{X) = (3{vi) MX G Z, where Vi is the closest node of the mesh to the voxel 
X. In both cases, the final value of B{X) is not the value of B^{X), as explained 
in the second step. 

Notice that the definition of Z ensures that if w/(x) = 0 then V(x) C Z and 
in consequence Bf{x) can be computed. 



5.2 Second step: computation of B on the narrow band 

The purpose is to propagate the values of B from Z to the whole narrow band. 
This is done by B{X,t) = Bf'{y,t) where ui{y) = 0 and y lies on the same 
characteristic of u than X. Computing directly y = X — vX/u is not robust since 
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small errors in Vu may introduce larger errors (proportional to u) in y. Instead, 
we follow the characteristic passing through X by unit steps: 

' 

yo^x 

, _/h Wi(s/n)<0 then max{ui{yn), sign{ui{y„)))\/iu{y„) 

\if ui{yn) > 0 then min{ui{y„), sign{ui{y„)))Viu{y„) 

ui(y„) = 0 

This marching is done for each voxel in the narrow band, even those of 
Z. The computation of the march direction Xiu{yn) requires the evaluation of 
Vm at voxels of the grid. The choice of the numerical scheme for Vu{X) is 
crucial since it may introduce unrecoverable errors if X lies on the skeleton of 
S. Our choice is based on the schemes used in the resolution of Hamilton- Jacobi 
equations where shocks occur [25,23]. These schemes use switch functions which 
turn on or off whenever a shock is detected. We explicit here our choice. Let 
D^u = u(i + 1, j, k) — u{i,j, k) and D~u = u{i,j, k) — u{i — 1, j, k), with similar 
expressions for Dy and D^- We form the eight estimators ZJ*, i = 1, . . . , 8 of 
Vm, namely D^u = (O+u, U+u, ZJ+m), D^u = (ZJ+u, ZJ+m, ZJju), • • •, D^u = 
{D~u,D-u,D-u). 

In our current implementation we use Vu(X) = ArgMaxi(|ZJ*'u(Jf)|). Indeed, 
apart from points on the skeleton of S where Vu is undefined, |V'u(A)| which 
should be equal to 1 since u is a distance function is found to be in practice less 
than or equal to 1 depending on which of the operators ZJ* we use. Hence the 
direction of maximum slope at X is the direction of the closest point to X of 
the zero level set of u. The fact that the skeleton can be detected by comparing 
the vectors D^u, D'^u , . . . , D^u is discussed in section 6.2. 



6 Applications 

We now describe two applications where our new method is shown to work 
significantly better than previous ones. 



6.1 Cortex segmentation using coupled surfaces. 

We have implemented the segmentation of the cortical gray matter (a volumet- 
ric layer of variable thickness (« 3mm)) from MRI volumetric data using two 
coupled surfaces proposed in [26] by Zeng et al. The idea put forward in [26] is 
to evolve simultaneously two surfaces with equations of the form (1). An inner 
surface Sin captures the boundary between the white and the gray matter and 
an outer surface Sout captures the exterior boundary of the gray matter. The 
segmented cortical gray matter is the volume between these two surfaces. The 
velocities of the two surfaces are: 



f^in f ^in) T T ^) 

Pout ~ lout) “t” C{Uin c) 



(10) 

( 11 ) 
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where I is the local gray in- 
tensity of the MRI image, and 
lout are two thresholds (Ji„ for the 
white matter and lout for the gray 
matter), e is the desired thickness 
and C and / have the shape of fig- 
ure (5). 



Let us interpret equation (10). 

The first term /(/ — /m) forces the 
gray level values to be close to Im on Sin- it is the data attachment velocity 
term. The second term C {uout + ^) models the interaction between Sout and Sin- 
it is the coupling term. According to the shape of C, see figure (5), if locally the 
two surfaces are at a distance e = 3mm, then the coupling term has no effect 
{C = 0) and Sin evolves in order to satisfy its data attachment term. If the local 
distance between Sin and Sout is too small (< e) then C > 0 and Sin slows down 
in order to get further from Sout- If the local distance between Sin and Sout is 
too large (> e) then C < 0 and Sin speeds up in order to move closer to Sout- A 
similar interpretation can be done for (11). 



If these evolutions are implemented with the Hamilton- Jacobi equation (3), 
then the following occurs: the magnitudes of the gradients of Uout and Uin in- 
crease with time (| Vuout \> 1 and | |> 1). As a consequence, the estima- 

tion of the distance between Sin and Sout which is taken as Uin{x) for x on Sout 
and Uout(x) for x on Sin, is overestimated. Since the coupling term is negative 
in (10) and positive in (11), both Sout and Sin evolve in order to become closer 
and closer from each other (until the inevitable reinitialization of the distance 
functions is performed). In other words, with the standard implementation of 
the level sets, the incorrect evaluation of the distance functions prevents the cou- 
pling term to act correctly and, consequently, also prevents the data attachment 
terms to play their roles. 



On the other hand, if these evolutions are implemented with our new PDE, 
then a much better interaction between the two terms is achieved since the data 
attachment term can fully play its role as soon as the distance between the two 
surfaces is correct (cf. fig. (6)). 

These results are demonstrated in the figure (6) which we now comment. 
Each row corresponds to a different 32 x 32 sub-slice of an MRI image. The first 
column shows the original data and some regions of interest (concavities) are 
labeled A, B and C. The second column shows a simple thresholding at Im and 
lout - The third column shows the cross-sections of Sin and Sout through the slices 
if the coupling terms are not taken into account. This is why these curves have 
the same shape as in the second column. One observes that the segmented gray 
matter has not the wanted regular thickness. In the fourth column, the coupling 
terms are taken into account and the evolutions (10) and (11) are implemented 
with Hamilton- Jacobi equation (3). One observes (in particular at the concavities 
indicated in the first column) that the distance constraint is well satisfied but 
the data attachment term was neglected. This is due to the fact that with (3) 
the distance between the two surfaces is overevaluated. In the fifth column, this 
same evolution is implemented with the new PDE introduced in this paper (9). 
One can observe a much better result at concavities. This is due to the fact 
that the coupling terms stop having any effect as soon as the distance between 
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Fig. 6. Results of the segmentation of the gray matter using different algorithms, see 
text. 



the surfaces is correct allowing the data term to drive correctly the surfaces 
according to the gray level values. 

6.2 Extraction of the skeleton of an evolving surface. 

Skeletons are widely used in computer vision to describe global properties of 
objects. This representation is useful in tasks such as object recognition and 
registration because of its compactness [6,17,22]. 

One of the advantages of our new level set technique is that it provides, 
almost for free, at each time instant a description of the skeleton of the evolving 
surface or zero level set. 

We show an example of this on one of the results of the segmentation de- 
scribed in the previous section. We take the outside surface of the cortex and 
simplify it using mean-curvature flow, i.e. the evolution ^ = HJ\f where H is 
the mean curvature. This evolution is shown in the first column of figure 7. Since 
the distance function u to the zero level set is preserved at every step, it is quite 
simple to extract from it the skeleton by using the fact that it is the set of points 
where Vu is not defined [6]. This is shown in the right column of figure 7. Each 
surface is rescaled in order to occupy the whole image. 

The skeletons are computed using the distance function to the evolving sur- 
face as follows. We look for the voxels where the eight estimators D^u of Vu 
defined in section 5 differ a lot and threshold the simple criterion: 
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where (., .) denotes the dot product of two vectors and Du = | D^u. 

This can be interpreted as a measure of the variations of the direction of Vu 
(which are large in the neighborhood of the skeleton). 

The results for the left column of figure (7) are shown in the right column 
of the same figure where we clearly see how the simplification of the shape of 
the cortex (left column) goes together with the the simplification of its skeleton 
(right column). 

Note that because it preserves the distance function, our framework allows 
the use of more sophisticated criteria for determining the skeleton [18] based on 
this distance function. 



7 Conclusion 

We have proposed a new scheme for solving the 
problem of evolving through the technique of 
level sets a surface S (t) satisfying a PDE such as 
(1). This scheme introduces a new PDE, (9), that 
must be satisfied by the auxiliary function u{t) 
whose zero level set is the surface S{t). The 
prominent feature of the new scheme is that the 
solution to this PDE is the distance function 
to S{t) at each time instant t. Our approach 
has many theoretical and practical advantages 
that were discussed and demonstrated on two 
applications. Since the distance function to the 
evolving surface is in most applications the pre- 
ferred function, we believe that the PDE that 
was presented here is an interesting alternative 
to Hamilton- Jacobi equations which do not pre- 
serve this function. 
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Abstract. Properties of points in images are often measured using con- 
volution integrals with each convolution kernel associated to a particular 
scale and perhaps to other parameters, such as an orientation, as well. 
Assigning to each point the parameter values that yield the maximum 
value of the convolution integral gives a map from points in the image 
to the space of parameters by which the given property is measured. 
The range of this map is the optimal parameter surface. In this paper, 
we argue that ridge points for the measured quantity are best computed 
via the pullback metric from the optimal parameter surface. A relatively 
simple kernel used to measure the property of medialness is explored 
in detail. For this example, we discuss connectivity of the optimal pa- 
rameter surface and the possibility of more than one critical scale for 
medialness at a given point. We demonstrate that medial loci computed 
as ridges of medialness are in general agreement with the Blum medial 
axis. 



1 Introduction 

In the article “Scale in Perspective,” [1], Koenderink illustrates the importance 
of scale in measuring how cloudlike a point in the atmosphere is. The property 
of “cloudlikeness” is discussed in terms of the density of condensed water vapor, 
and Koenderink, citing Mason [2], eventually settles on what amounts to the 
following definition: the cloudlikeness at a point in the atmosphere is the average 
density of condensed water vapor in a ball of volume 1 centered at the 
point. The size of the ball used in this measurement is crucial, and Koenderink 
reminds us that all physical measures of density should be thought of as part of 
a one-parameter family of density measures, with the level of resolution of the 
measuring instrument as the parameter. 

Thresholding the value of cloudlikeness (at 0.4 gm~^) gives a way to de- 
termine the boundary of a particular cloud; but there are other ways that the 
scalar field of cloudlikeness measures might be used. For example, a more de- 
tailed understanding of the cloud might be obtained by examining height ridges 
[3], generalized maxima of condensed water vapor density, within the cloud. To 
finish with this introductory example, consider whether we can be truly confi- 
dent that the single scale represented by balls of volume 1 is exactly correct 
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in all situations. A more prudent strategy would be to measure the cloudlike- 
ness at a point over a range of scales, say from balls of size 0.5 to balls of 
size 2 m^. We could then agree to define the cloudlikeness at a point to be the 
maximum, over the agreed upon range of scales, of these measured densities. 
This gives another bit of information; not only do we have the cloudlikeness at 
a point, but the scale at which the maximum density occurs, and perhaps this 
extra bit will reveal a deeper level of structure within the cloud. 

In their papers on “Zoom-invariant Figural Shape”, [5] [6], Pizer and Morse, 
et. ah, employ the principles outlined above to define and measure a property 
they call “medialness” at a point within an image. For an object with clearly 
defined boundary in a 2-dimensional image, the medial axis transform, first de- 
scribed by Blum [7], is the locus of centers of disks that are maximally inscribed 
in the object, together with the radii of those disks. This information provides a 
simple description of the size, shape, and location of the object, and is complete 
in the sense that the original object can be reconstructed exactly from its medial 
axis transform. In the presence of image disturbances, the exact nature of the 
reconstruction can be a liability rather than a feature, since the disturbances 
are also reconstructed from the transform. To overcome this liability, Pizer and 
his colleagues [5] take a multiscale approach to the extraction of medial loci di- 
rectly from image intensities (with no pre-processing segmentation of image into 
objects required). The idea is to measure the medialness of points via a convo- 
lution integral whose kernel involves both a scale and an orientation; then to 
extract ridges from these measurements. In [6] they provide strong experimen- 
tal evidence that these ridges of medialness are insensitive to small scale image 
disturbances. They use the term “cores” for such ridges. Having achieved the 
objective of overcoming small disturbances, they go on to describe applications 
of cores to problems of segmentation, registration, object recognition, and shape 
analysis [8] , [5] . Indeed, by placing a disk with radius proportional to the optimal 
scale at each point, a useful Blum-like description of the boundary at the scale 
of the core (BASOC) is produced. 

In this paper we propose that the extraction of height ridges for properties 
like medialness is best performed using a pullback metric from the parameter 
space. There are two advantages to using a pullback metric. First, ridges are com- 
puted directly in the image instead of being computed in the higher-dimensional 
product space formed from the image and the measurement parameters. Second, 
the calculation of ridges is metric dependent and the pullback metric assigns the 
proper distance between points of the image, based on the optimal parameters 
for the measurement in question. An outline of the contents of the paper follows. 
In section 2, we review definitions of medialness at a point in an image and of the 
optimal scale surface for medialness. In section 3, we describe the mathematics 
of pullback metrics and make subsequent computations of gradients, hessians, 
and convexity ridges. In section 4 we apply the pullback metric from the opti- 
mal scale surface to the problem of extracting medial loci, illustrating results 
for rectangles. In principle, any image property measured pointwise at various 
scales (and with various auxiliary parameters) is amenable to this treatment. 
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The optimal scale surface for the measurement inherits a metric from the metric 
on the parameter space; this metric may be pulled back to the original image 
and provides what we believe to be the correct metric in which to compute ridges 
for the measured quantity. The approach presented here supplements the recent 
work of Furst [9] in which a marching cubes like algorithm for tracking ridges is 
developed. 

2 Medialness and the Optimal Scale Snrface 

We adopt the following definition of medialness at a point in a 2-dimensional 
image from the paper of Pizer, Eberly, Morse, and Fritsch [5]. In that paper 
several different medialness kernels are proposed; the kernel below is sufficient 
for our purposes although other kernels may produce medial loci more closely 
associated to Blum’s medial axis. 

Definition 1. Let x he a point in let u he a unit vector in and let a he 
a positive scalar. Let Grj{x) = exp{—\x\'^ / 2a‘^) he the standard 2D gaussian 

with scale a, denote the matrix of second partials of at x hy hess{G^)\x, and 
set K{x,a,u) = —a^u* .hess{Ga-)\x-u. Let L{x) he a 2D image intensity, i.e., a 
hounded nonnegative function on R.^ having compact support. 

Then, the parameter-dependent medialness at the point x, relative to the in- 
tensity function I, measured at scale a and with orientation u is 

m{I,x,a,u)= / I{z)K{z — x,a,u) dz. 

The medialness at the point x, denoted M{x), is the maximum, over scales a 
and orientations u, ofm{I, x, a, u). The values of a and u at which the maximum 
value of m{L, x, a, u) occurs are the optimal parameters. 

Lemma 1. The optimal orientation at a point x is completely determined hy 
the optimal scale. 

Proof. The medialness function m{I, x, a, u) may be re-written in the form u^Au, 
where ^ is a symmetric matrix whose entries are integrals with values depending 
on scale. From this form, we see that the optimal orientation at a given scale is 
the eigendirection for A corresponding to its largest eigenvalue. 

For a given image, denote the support set for the intensity function by 17. We 
may “zoom” the image by a constant factor of A > 0 by defining a new intensity 
function I\{z) = I{z/X). Clearly the support set for the new intensity is the set 
Al7. We record the fundamental property of medialness, “zoom-invariance,” in 
the following proposition. 

Proposition 1. The parameter- dependent medialness function is invariant to 
zoom, meaning that for any positive real number X, 



m{L\, Xx, Act, u) = m{L, x, a, u). 
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Proof. The proof that 

/ Ix{w)K{w — Xx, \a,u) dw = / I{z)K{z — x, a,u) dz. 

J Xf2 J Q 

is accomplished by means of the change of variables formula for multiple inte- 
grals. This property accounts for the factor of in the definition of the medi- 
alness kernel. 

The graph of the medialness kernel K is centered over the point at which the 
measurement is made, has the shape of a Gaussian in the direction orthogonal to 
u, and has the shape of a Gaussian second derivative in the direction of u. This 
shape makes this kernel particularly effective for locating medial axes of objects 
having parallel sides and uniform interior intensity: orienting the kernel so that 
u is perpendicular to the parallel sides will maximize the value of medialness. 
It is interesting to note that for rectangular objects with aspect ratio less than 
1/5, the scale that maximizes medialness at the center of the rectangle is equal 
to half the distance between the parallel sides. The other simple plane figure 
is the disk of radius R: here symmetry dictates that the parameter-dependent 
medialness measure at the center of the disk is independent of u and the optimal 
scale is Rjxpl. 

By the term scale space, we mean the Gartesian product x K+ = S 
consisting of points x in the image plane and scales cr at which medialness mea- 
surements are made. Associating to each point its optimal scale for medialness 
gives a map CTq from the image plane into scale space given by a; i— {x, ao{x)). 

Definition 2. The set of points in scale space of the form (x,ao{x)) is the op- 
timal scale surface for medialness. 

The nature of the map Uo is not completely understood: in [5] it is claimed 
that CTo is continuous; in [3], the optimal scale surface is redefined in terms of 
the smallest scale at which a local maximum for medialness occurs and further 
assumptions of continuity and differentiability are made; while in [9] reference 
is made to the possibility of folds in the optimal scale surface and the need 
to use more general coordinate charts. As we shall illustrate in section 4, the 
optimal scale surface need not be connected. Nevertheless, we shall continue our 
development by assuming that there are open sets bounded by Jordan curves in 
the image plane such that the restriction of cto to any one of these open sets is 
twice differentiable. In what follows we shall restrict attention to a single such 
open set, continuing to refer to the graph of (Tq over that open set as the optimal 
scale surface. 

Under the assumption of differentiability, we proceed to define the tangent 
map. 

Definition 3. The tangent map aot, maps vectors in the image plane to vectors 
in scale space. Let v be a vector in the image plane at the point x. Then the 
vector is the vector based at the point cr(x) that is defined by taking any 

parametrized curve c{t) in the image plane whose initial position is x and whose 
initial velocity is v and setting aot.{v) to be the initial velocity of the curve a(c(t)). 
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The vector CTo*(v) is well-defined independent of the curve c(t) by which it is 
computed. Moreover, the map (Tq.* is a linear map between the (linear) tangent 
spaces at x and at a{x). The important vectors at a point {x,y) in the image 
plane are the coordinate vectors dx = (1,0) and dy = (0,1). Using the same 
notation to denote the coordinate vector fields in scale space, we see that the 
set of vectors {dx,dy,da} forms a basis for the vectors in scale space. In terms 
of these coordinate bases, we have the familiar formulas 

cTo*{dx) = (1,0, aox) and ao^{dy) = (0,1, (Joy). 

3 The Scale Space Metric, the Induced Metric, the 
Pullback Metric; Differential Operators and Ridges 

Our notation for the coordinate vector fields in the image plane, {dx,dy}, belies 
the fact that we consider these vector fields primarily via their action on func- 
tions: applying dx to the function / yields the partial derivative fx, a new func- 
tion on the image plane. One may extend this action on functions to an action on 
vector fields and tensors of higher order by defining covariant differentiation. In 
the presence of a metric. Lemma 2 below indicates that a unique preferred notion 
of covariant differentiation (via the Levi-Civita connection). The development of 
a suitable metric on the image plane for the extraction of medial loci will take 
some time; we emphasize that the role of the metric is to define the differential 
operators by which ridges of medialness are to be computed. Standard references 
for the ideas and coordinate free notation of differential geometry presented in 
this section include [10] and [11]. 

A Riemannian metric on a differentiable manifold is is a means for computing 
dot products of tangent vectors. This amounts to the assignment of a nondegen- 
erate, symmetric bilinear form h on each tangent plane of the manifold. Once 
such a metric is assigned, arclengths, and subsequently distances between points, 
may be computed by integrating lengths of velocity vectors along differentiable 
curves. By the term scale space, we mean the product manifold E? x [(Tq, sigmai], 
where, for a particular 2-dimensional image, the inner scale ctq is the smallest 
scale consistent with the image production process and where the outer scale a\ 
is the largest scale consistent with the image size, [5], [12]. We make the following 
choice for a metric on scale space. 

Definition 4. Let v and w he vectors in scale space at the point (x, y, a) and let 
[x] and [ru] be the coordinate vectors of v and w relative to the basis {dx,dy,da-}- 
Then h\(^x,y,a){v,w) = [x]*[/i][w], where [h] is the matrix 




This metric is often expressed less formally by writing 

2 dx^ + dy^ + da^ 
ds = 7, . 

(T^ 
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The resulting geometry is hyperbolic, having constant sectional curvature — 1. 
The rationale for this choice of metric is as follows: if the spatial distance between 
points p and q in is L when measured at scale cr, then the distance between 
those points when measured at scale | is 2L. Similarly, reported scale differences 
exhibit the same dependence on the scales at which they are measured. This gives 
ds^ = ■ and for convenience we may assume that the conversion 

factor p between the units of spatial distance and the units of scale difference is 
1 . 

The metric h induces a metric ho on the optimal scale surface, simply by 
restricting h to the tangent planes to that surface. Re-expressing the restriction 
in terms of the coordinate basis for the tangent plane to the optimal scale surface 
at the point Oo{x,y) given by {oot:{dx),cfo*{dy)'\ yields the 2-by-2 matrix 



[(Jo* ho\ = [g\. (1) 



To explain the notation on the right hand side of (1), we may consider the 
induced metric ho as a metric on the original image plane, the so-called pullback 
metric, denoted hy g = ao*{h) and defined for vectors v and w in the plane of 
the original image by g{v,w) = ho{(Jo*{v),Oot.{w)). In particular, we have that 
the matrix [g] relative to the coordinate basis {dx,dy\ is identical to the matrix 
for [ho] as indicated in (1). It is this metric g, expressed by the matrix [g] in (I), 
that we will use to study the medialness function M. 

Our next task is to define the gradient and hessian of M relative to our metric 
g. We rely on covariant differentiation (which gives a way to differentiate tensor 
fields) in order to accomplish this task. That the task can be accomplished in 
only one way once a metric is prescribed is a consequence of the following lemma 
[11] (in the statement of the lemma g is used to denote an arbitrary metric). 

Lemma 2 (Fundamental Lemma of Riemannian Geometry). On a dif- 
ferentiable manifold N with metric g there is a unique way to define covariant 
differentiation that is compatible with both the manifold structure of N and with 
the metric g in the sense that the covariant derivative of g satisfies Dg = 0. 

As indicated in the beginning paragraph of this section, covariant differen- 
tiation of functions along vector fields amounts to directional derivatives. For 
a vector field v and a function /, we write Dyf to indicate this derivative. In 
terms of the coordinate vector fields {d^, dy}, we may write v = Vidx~\-V 2 dy, and 
the expression for the directional derivative becomes Dyf = vifx r' 2 /y The 
gradient grad / of the function / is defined by means of the metric g by setting 
grad / to be the unique vector field satisfying Dyf = (/(grad /, v) for every vector 
field V. 

Next we consider the covariant derivative of a vector field w along the vector 
field v. Denoted by DyW, this new vector field may again be computed in terms 
of the metric via the Koszul formula for g{DyW, u). In this formula, given below, 
u is an arbitrary vector field and terms of the form [v, w] involve the Lie bracket 
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of of two vector fields (the Lie bracket is a non-metric derivative of one vector 
field along another; compatibility of D with the manifold structure of N means 
precisely that [u,rc] = DyW — D^v). 



2g{DyW, u) = Dyg{w, u) + D^g{u, v) - D^g{v, w) 

-g{v, [ic, u]) -k g{w, [u, w]) -k g{u, [v, w]). 

We shall have little use for Lie brackets below, as we will revert to express- 
ing the tensors of interest to us in terms of coordinate vector fields and the Lie 
bracket of coordinate vector fields is zero. We note in passing that the Koszul for- 
mula, when applied to coordinate vector fields, yields expressions for the classical 
Christoffel symbols. 

The metric-dependent hessian (a symmetric tensor of type (2, 0)), defined for 
any twice differentiable function /, may now be written concisely as 

hess f{v,w) = g(L>„grad /, w) . 

When N has dimension 2, we may choose a basis for the tangent space at each 
point and compute a 2-by-2 matrix [hessj^]. The trace of this matrix is a new 
function on N, the metric-dependent Laplacian of /. Note further that the matrix 
[hess/] is symmetric and hence diagonalizable over the reals. This leads directly 
to the consideration of convexity ridges for /. 

Definition 5. Let f be a twice differentiable function on a 2-dimensional mani- 
fold N with metric g. On the open subset of N where the eigenvalues A+ > A_ of 
hess / are distinct, with corresponding eigenvectors e’*' and e~ , the maximum con- 
vexity ridge for f is the set of points where A_ < 0 and where (/(grad/, e~) = 0. 

As mentioned in the introduction, convexity ridges [3], [4] are generalized 
maxima for /; at each ridge point, / has a local maximum in the direction of e~ 
which is transverse to the ridge. It is also worth noting that the current ridge- 
tracking algorithms of [5] and [9] involve computing the hessian of the medialness 
function m in 3-dimensional scale space, then considering the restriction of the 
hessian of m to a 2-dimensional subspace in scale space. The restriction of the 3D 
hessian to the optimal scale surface in scale space is not the same as the hessian 
obtained by first restricting the function m to obtain M on the optimal scale 
surface then computing in the metric intrinsic to the surface; the restricted 3D 
hessian involves a second term resulting from the (generally nonzero) curvature 
of the optimal scale surface in scale space. 

With these generalities as foundation, we return to the image plane furnished 
with its metric g, the pullback of the restriction of the hyperbolic scale space 
metric to the optimal scale surface. The matrix for this metric, in terms of 
the coordinate vector fields {dx,dy}, is given by formula (1). The medialness 
function M = M{x, y) is a function on the image plane, and relative to the basis 
of coordinate vector fields we have 
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The maximum convexity ridge for M relative to the metric g constitutes the 
medial locus for the image. To compute the matrix [hessjvf] in practice requires 
knowledge of the second partial derivatives of M and, because the Koszul for- 
mula for covariant derivatives of coordinate vector fields involves derivatives of 
the metric g, knowledge of the second partial derivatives of the function Uq- We 
emphasize that no higher order derivatives are required and that by using the 
pullback metric, ridges of dimension 1 are computed directly in the 2-dimensional 
image plane rather than being computed in, then projected from a higher di- 
mensional parameter space. 



4 Medial Loci for Rectangles 

In this section, we compute medial loci for binary images of rectangles, using 
the eigenvalues and eigenvectors of [hessM]- Computations are performed using 
a grid of 1600 equally spaced points on the square [—1,1] x [—1,1]. At each 
point in the grid, approximate values of the optimal scale function Uo and of 
the medialness function M are computed by sampling scales cr with 0.05 < 
(7 < 1. These values are then used to generate two- variable quadratic Taylor 
polynomials for Uo and M centered at each point in the grid using a least squares 
fit to the sampled data. Coefficients of these Taylor expansions are then used as 
approximate values for the derivatives of Uo and M in the formula for [hessM]- 
Determination of eigenvalues, eigenvectors, and ridges points follows. 




Fig. 1. The optimal scale surface and the medial locus for a square. 
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In Figure 1, we show the optimal scale surface and medial locus for a square 
as determined by our algorithm. The original intensity function is 1 at each point 
in the square. The plot at left shows optimal scales for medialness at each point 
in the square, with lighter shading indicating larger scale. Approximately 1400 
of the 1600 grid points have optimal scales larger than 0.75 with a maximum 
of 0.90, while the remaining points, those nearest the edges and corners of the 
square, have optimal scales smaller than 0.45, the optimal scale being 0.05 at 
points along the square’s boundary. At right, the maximum convexity ridge 
for the medialness function M, computed using the pullback metric, is shown. 
The ridge is overlaid on a plot of medialness values, again with lighter shading 
indicating a higher value. Points on the ridges shown here lie within the set of 
points where (To > 0.75 and satisfy \g{gra,dM,e~)\ < 0.01. Near the midpoints 
of the edges of the square, it can be seen that extremely small scales increase 
medialness values. 

In Figure 2, we consider the function m{I, {x, x), a, (—1, 1) / -\/2) at the points 
(.625, .625), (.65, .65), (.675, .675). Note the two local maxima for medialness 
at (.65, .65). From the information in this figure we may conclude that the 
optimal scale surface for the square is either disconnected or has a fold. The 
global picture of Figure 1 allows us to see that it is not possible to go from 
(0.65, 0.65, smaZZer critical scale) to (0.65, 0.65, ?ar(/er critical scale) along a 
path that remains on the optimal scale surface; the option of a fold is not pos- 
sible and we conclude that the optimal scale surface is disconnected. 



m 





scale 




Fig. 2. Graphs of medialness as a function of scale along the diagonal of the square. 
The center graph shows two critical scales at the point (.65, .65). 



Plots of medialness as a function of scale at other points in the figure exhibit 
similar behavior. As one moves along the horizontal axis of symmetry for the 
square away from its center, the optimal scale increases (with corresponding 
optimal orientation occurring with u perpendicular to this symmetry axis) as 
shown in Figure 1. Meanwhile, a second critical scale, smaller than the optimal 
scale, develops (starting at about x = 0.60). As one approaches the midpoint 
of an edge of the square, the two critical scales persist until finally the value 
of medialness at the smaller critical scale becomes maximal and the optimal 
orientation of the medialness kernel rotates through 90 degrees. 

In Figure 3, the extracted medial loci for rectangles with aspect ratios 0.85 
and 0.70 are illustrated. Solid lines indicate the Blum medial axis for each rect- 
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angle. As in Figure 2, the medial loci are overlaid upon plots of values for the 
medialness function M and ridges are computed over an open subset wherein (Tq 
is large and the optimal scale surface is connected. Our computations indicate 
that convexity ridges for medialness computed from pullback metrics branch in 
much the same way as does the Blum medial axis. Our failure to detect branches 
of the medial axis for smaller aspect ratios is due to the nature of the kernel 
employed for these computations, a kernel that over-emphasizes long parallel 
sides in object boundaries and under-emphasizes corners. 




Fig. 3. The medial axes and ridges of medialness for rectangles having aspect ratio 
0.85 and 0.70. 
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Abstract. The maximal convexity ridge is not well suited for the anal- 
ysis of medial functions or, it can be argued, for the analysis of any func- 
tion that is created via convolution with a kernel based on the Gaussian. 
In its place one should use the maximal scale ridge, which takes scale’s 
distinguished role into account. We present the local geometric structure 
of the maximal scale ridge of smooth and Gaussian blurred functions, a 
result that complements recent work on scale selection. We also discuss 
the subdimensional maxima property as it relates to the maximal scale 
ridge, and we prove that a generalized maximal parameter ridge has the 
subdimensional maxima property as well. 



1 Introduction 

One of the central tasks in the field of computer vision is the analysis of greyscale 
images with a view toward extracting geometric loci that are intrinsic to the 
scene. Such analysis includes, for example, edge detection [1], skeleton extraction 
{e.g., via cores [5]), and ridge extraction [11] [4]. In [7], we, along with Pizer and 
Keller, take the position that one can think of the geometric loci in an image 
as the height ridges of some function. This function is derived from an image’s 
pixel intensity function by means of convolution with kernels that may involve, 

M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 93—104, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 




94 



J. Miller and J. Furst 



in addition to the image’s spatial variables, orientation and scale parameters. In 
[7] we also reported preliminary results [10] [13] on the local generic structure 
of maximal convexity ridges of pixel intensity functions [2] on and their 
related relative critical sets. Though those results appeared unrelated to the 
material that preceeded them insofar as the maximal convexity ridge can not 
distinguish scale or orientation from any other function variable, those results 
are important to computer vision. They provide the standard against which 
the success of (maximal convexity) ridge extraction methods can be judged. It 
is in this same spirit that we present structure results for the maximal scale 
ridge, a ridge that takes scale’s distinguished role into account [4], [5]. We report 
its generic local geometric structure for both the case of smooth functions and 
functions derived from a pixel intensity function / : — >■ R via convolution 

with a particular medial kernel. We conclude the paper with a discussion of the 
subdimensional maxima property as it relates to ridge definitions that involve 
multiple parameters. 

Recall that the (one dimensional) maximal convexity ridge of a C^-function 
/ defined on an open subset [/ C R^ is defined as follows. Let Ai(x) < A2(x) < 
Aa(x) be the eigenvalues of the 3x3 Hessian matrix i 7 (/)(x), and let vi(x) 
and V2(x) be unit eigenvectors associated to the first two eigenvalues. The point 
X G U lies on the ridge if and only if V/ • = 0 for i = 1, 2 and A2 < 0 at x. 

These conditions are not enough to guarantee that /(x) is a locally maximum 
value of /, but they are enough to guarantee that /(x) is locally a maximum 
value of f\W, the restriction of / to the plane W{x) = span (vi(x), V2(x)) [7], 
[13], [3]. As discussed in [7], this geometric property characterizes what it means 
for X to be an abstract ridge point of /. We will follow Kalitzin [9] and call this 
the subdimensional maximum property. 

As was noted in [7], the height ridge as defined in [7] need not have the 
subdimensional maxima property. Therefore, when we use this definition as a 
basis for the creation of a specialized ridge definition we must be careful to 
verify that the newly defined ridge has the subdimensional property. We will 
do so for the maximal scale ridge and, at the end of the paper, we show that 
under certain conditions multiparameter ridges have the subdimensional maxima 
property. 

To motivate interest in a ridge definition that treats scale as a distinguished 
parameter, consider how one uses ridge theory to identify the shape skeleton 
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of objects in a greyscale image. Let / : M represent the pixel intensity 

function of a greyscale image, and let K{x,a) = — ct^AG(x, cr) where G(x,cr) 
is the standard Gaussian kernel with standard deviation a, which we call scale. 
The convolution I * K is a, C°° function called a medial function because the 
value I * iL(x, cr) reflects how well the point x is in the center of a figure of width 
a in the original image. 

By means of example, see that in Fig. 1 we chose a point x in the center of a 
vertical bar. Of the three circles centered at x, the middle circle seems to be the 
largest circle centered at x that sits within the bar. Its radius determines a scale 
(To at which medialness at x will be locally maximal. Once we know the scale 
value at which medialness at x is locally maximal, we can analyze the spatial 
directions to determine the skeleton’s tangent and normal directions at x. In 
this example, the scale parameter and the spatial variables play vitally distinct 
roles. This compels us to develop a ridge definition in which the scale and spatial 
components are treated separately. More generally, insofar as scale-space based 
analysis uses filters built on the Gaussian, and scale parameterizes the blurring 
away of detail, any geometric image analysis in scale space ought to treat spatial 
and scale variables differently. In what follows, we define a variant of the maximal 




Fig. 1. The three circles represent scales at which medialness is measured. Medialness 
is highest at the scale represented by the middle circle because, of the three concentric 
circles, it fits most snugly into the bar. 
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convexity ridge called the maximal scale ridge that does just that. This ridge 
will be defined for a function of three variables, two spatial and one scale. The 
definition is closely related to the criteria Lindeberg uses for his method of 
automatic scale selection [11]. (This ridge definition can be used in any setting 
involving two spatial variables and one distinguished parameter. Moreover, it 
can be easily adapted to contexts where one is interested in minimal parameters 
values. For these reasons, one might choose to call the ridge under discussion the 
optimal parameter ridge.) We will observe that (1) the the maximal scale ridge is 
defined in terms of a maximal convexity ridge, and (2) this relationship does not 
imply that the two varieties of ridge have the same local generic structure. We 
shall observe that the ridge- valley-connector curves of these two are remarkably 
different. Reasons for the difference are explained by classical catastrophe theory. 
Finally, we shall prove that this new definition indeed gives us a ridge; every point 
in the locus is a subdimensional maxima in a well defined sense. 

2 The Definition 

Let /(x, a) be a smooth differentiable function on U an open subset of x R.++ 
scale space. The demand that ridge points (x, cr) be local maxima in scale gives 
the following necessary condition 

df d'^f 

at (x,ct), — = 0 and ^ < 0. (1) 

Because the set 

X = {(x, a) G R2 ^ R++ I (x, a) = 0 }. (2) 

is generically smooth, (1) defines a smooth surface Xm C x M++ called the 
maximal scale surface. Consequently, all maximal scale ridge points must lie in 
the surface Xm. 

One of the computational advantages of defining a critical scale surface is the 
dimensional reduction it offers. We break the problem of defining a ridge in a 
three dimensional space broken into two steps: finding a surface in x R++ and 
then finding a ridge on the surface. The two problems together are simpler than 
the original problem. We know of three approaches that exploit this reduction 
in dimension to calculate ridges on Xm. 
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1. Calculate height ridges of f\XM 

2. Calculate height ridges using approximate derivatives calculated by selec- 
tively ignoring the change in scale on Xm 

3. Calculate height ridges of / on coordinate patches mapping x {0} C 

X R++ to Xm 

Eberly demonstrates how height ridges of J\Xm can be computed using the 
intrinsic geometry of Xm [4]; however, even for the case of a surface in 
this approach is computationally expensive and has never been implemented. 
Kalitzin [9] implements (2) using a maximal orientation surface (rather than 
a maximal scale surface) and reports good results for a single test case. This 
approach, however, disregards effects the geometry of Xm has on the deriva- 
tive calculations. Fritsch uses approach (3) because it is both computationally 
tractable and it incorporates the geometry of Xm into the computation. The 
rest of this paper is devoted to examining the mathematics of (3) and what it 
tells us about the local geometric structure of the maximal convexity ridge. 

Eberly described a mathematical approach one could use to extract the max- 
imal scale ridges from an image. He showed that that maximal scale ridge can be 
computed using a parameterization of Xm from the spatial subspace x {0} 
[4]. To be more precise, he used a coordinate patch <() : R^ — >• Xm to define 
g(x) = / o ^(x) and he claimed that i? C R^ is the ridge set of g(x) = J\Xm- 
Consequently, (f){R) C Xm is the maximal scale ridge of /. Fritsch used Eberly’s 
mathematics to find the cores^ of figures in portal images [5] and he did so with 
notable success, but some of what he saw bothered him. His extraction method 
relied on a ridge tracking algorithm, which works best when the ridge is a set of 
long unbroken curve segments. Given the figures in his images, he expected the 
cores to be exactly that. Contrary to his expectations, the maximal scale cores 
he extracted had a tendency to end abruptly (at points he certainly expected a 
shape skeleton to pass through smoothly). Moreover, after searching the vicinity 
of the core’s endpoint he would find the endpoint of another core. If one ignored 
the gap between the core segments, this new core segment essentially picked up 
where the previous segment left off. Fritsch knew of the structure classification 
for maximal convexity ridges and cores in R^ and R^ [3], [13], and found noth- 
ing in that context that could explain this consistent aberrant structure. Going 

^ In general, cores are ridges of functions created by filtering greyscale images for 
medialness. Fritsch’s cores were maximal scale ridges of medialness. See [5]. 
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back to [4] one sees that Eberly does not address the issue of the existence of 
the parameterizations he needs to establish his theory. This observation led us 
to an explanation of the “gap” phenomena Fritsch observed. 

At each point (x, cr) on the maximal scale ridge, V/ is necessarily orthogonal 
to a subspace of the tangent space T(^x,ct)Xm on which / has a local maxima at 
(x, cr). Eberly showed that this subspace can be identified using a local parame- 
terization (/>:[/—>■ X M++ where U is an open subset of the spatial subspace 
K.^ X {0}. But such a parameterization is not guaranteed to exist. 

Points at which such a local parameterization exists are characterized as those 
points at which the projection tt : A — >■ is a submersion. An elementary result 

from Thom’s catastrophe theory implies that the surface X contains a subset of 
codimesion 1 on which tt fails to be a submersion. This curve on Xm is called the 
fold curve, and the maximal scale ridge abruptly ends when the two intersect. 
(See Figure 2.) We must note three things at this point. First, at points away 
from the fold, Eberly’s results hold and the maximal scale ridge in x R++ is 
diffeomorphic to the maximal convexity ridge in R^ (see Theorem 5, properties 
1-3). 

Theorem 1 (Eberly). Suppose at (x,ct) € X there is a local para-metrization 
(():[/—>■ R^ X R++ of X with x G U C R^. The point (x, a) is a maximal scale 
ridge point of f if and only if the point x is a maximal convexity ridge point of 
f o (j) : U -G R++. 

Second, the fold is characterized as the set of point (x, cr) at which vanishes 
[12]. This is exactly the boundary of Xm, on which the maximal scale ridge is 
undefined. Third, although Damon showed how maximal convexity ridges when 
viewed as relative critical sets can be continued as connector curves, this fold 
singularity means we cannot call on these results to continue the maximal scale 
ridge. 

Note 2. It is a straightforward exercise to verify that those points (x, a) that are 
maximal scale ridge points according to Theorem 1 are subdimensional maxima 
with respect to the plane span(^, d^(v)). Where v is an vector for the most 
negative eigenvector of H{f o(f>). In the last section of the paper we will establish 
the subdimensional maxima property for an analogously defined multiparameter 
ridge. 
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(a) (b) 

Fig. 2. Notice the geometry at the fold curve of the critical scale surface pictured in 
the Fig. 2a. If there were a maximal scale ridge on X, it can be expected to intersect the 
fold curve and come to an end. This is the behavior Fritsch observed which tracking 
the ridge. Fig 2b. shows how the ridge can be continued using a connector curve (see 
Definition 6) which may lead to a nearby ridge segment. This illustrates why Fritsch 
saw small gaps in his maximal scale ridges. 



To explain the phenomena Fritsch observed, and to show how the ridge can be 
continued as a connector curve, the first author used mathematical machinery 
Eberly employed to prove Theorem 1 and techniques used to establish properties 
of relative critical sets in [13]. 

One of Eberly’s innovations in the proof of Theorem 1 was his use of a 3 x 3 
generalized eigensystem. 

Definition 3. The generalized eigensystem of 3x3 matrices M and N consists 
of vectors v G \ { 0 } and scalars 7 G M that satisfy the matrix equation 
Mv = jN-v. 



Eberly used M = H{f) and N = P, the matrix representation of tt. Let 71 < 72 
be the two generalized eigenvalues of H{f) [13] and let Vi and V 2 be corre- 
sponding unit generalized eigenvectors. With this data we can give an alternate 
definition for the maximal scale ridge. 
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Definition 4 (Maximal Scale Ridge). Let [/ C x K.++ be open. A point 
(x, a) G U is a maximal seale ridge point of f : U — >■ M z/ and only if at (x, cr), 



1 . 



df d^f 

— = 0 and < 0, and 
oa aa^ 



2. V/ • vi=0 and v\H{f)vi = 



7i < 0- 



Adopting this definition has the advantage of allowing us to compute the maxi- 
mal scale ridge without explicitly using (j). The fold curve still causes problems 
in this new setting because one of the generalized eigenvalues is unbounded in 
a neighborhood of the fold [13]. At this point we employ Catastrophe Theory 
and its theory of unfoldings [12] which tells us the form the derivatives of / 
may take near the fold curve. Using this derivative information, the first author 
determined the asymptotic behavior of the unbounded generalized eigenvalue 
and used that to define preridge maps [3] and relative critical surfaces that al- 
lowed us to complete Eberly’s geometric description of the maximal scale ridge 
in X R++ [13]. 

By using the generalized eigensystem and techniques used to establish the 
generic structure of relative critical sets , the first author proved that the generic 
structure of the maximal scale ridge in x R++ differs from that of the maximal 
convexity ridge in R^ only insofar as the maximal scale ridge comes to an end 
at the singularities of 7r|A. 



3 The Properties 

By using elements of Eberly’s proof of Theorem 1, methods used in the analysis of 
relative critical sets, and Catastrophe Theory, we proved the following structure 
theorem for the maximal scale ridge of smooth functions on U C R^ x R++ [13]. 

Theorem 5. For U C R^, there is a residual set of f G C°°{U) whose maximal 
seale ridges have the following properties: 

1. The ridge is a finite eollection of smooth, embedded one dimensional sub- 
manifolds which may have boundary. In particular, they neither cross nor 
branch. 

2. The ridge passes smoothly through critical points of f, but such critical points 
are Morse critical points of f with Hessian having distinct, nonzero eigen- 
values. 
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3 . Components of the ridge have a boundary point where 71 = 0 or 71 = 72 • 

4-. In addition, components of the ridge can have a boundary point at fold sin- 
gularities ofTr\X. 

If C C U is compact, the set of f G C°°{U) that exhibit these properties on C is 
open and dense. 

Properties (1) through (3) are analogous to those of the maximal scale ridge 
in and property (4) is caused by the fold curve. It should be noted that 
these results continue to hold for maximal scale ridges of medial functions on 
K.^ X M++ generated as the convolution of a greyscale pixel intensity function 
/ : K.^ — >■ M with the medial kernel — ct^AG(x, a) [13]. 

Furthermore, the techniques used to Theorem 5 lead to a natural definition of 
relative critical sets in K.^ x M++ that distinguish the role of scale. In particular, 
we obtain a ridge-valley-connector set that facilitates the use of ridge tracking 
methods for extracting cores from two dimensional greyscale images. 

Definition 6. Let (x, a) G X and let 71 < 72 be the generalized eigenvalues of 
H{f), and Vi and V2 their generalized eigenvectors. Then (x,ct) is a 

1. r-connector point of f if at (x, a) V/ • vi = 0 and 71 > 0 

"f 

2. valley point of f if, at (x, ct), V/ • V 2 = 0, > 0 and 72 > 0, 

3 . v-connector point of f if at (x, a) V/ • V2 = 0 and 72 < 0, 

Notice that when the maximal scale ridge hits the fold curve it is continued 
by a r-connector curve (see Fig. 2). This curve can be followed (possibly through 
some transitions to other connector curves) to an intersection with another fold 
curve at which point the curve becomes a maximal scale ridge curve again. 

4 Generalized Optimal Parameter Ridges 

The previous sections of this paper have dealt with the maximal scale surface 
in great detail. However, there are instances in which we may want to deal with 
other parameters. Kalitzin has already experimented with optimal orientation, 
as did Canny in his definition of edges. We motivated distinguishing scale in 
the case of medialness measurements and, more generally, Gaussian derivative 
measurements. However, Gaussian filters of two or more dimensions may have 
more than one scale component. Further, the second author [6] has described 
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medialness measurements that use both scale and orientation. In all cases de- 
scribed, the distinguished role of orientation and scale leads naturally to optimal 
parameter ridges. To define optimal parameter ridges in arbitrary spaces, let R” 
be a Euclidean space (typically the domain of a greyscale image) and let W be 
the domain of the parameters. 

Let V C M" X RP be open and let / : E — >■ R be smooth. Define Xm to be 
the set of (x, a) gV where 



- — = 0 and 

U(7 i 



doidaj 



is negative definite 



By the Generalized Maximal Rank theorem ([8], Theorem 4.4) and results in 
[13] Xm is generically (with respect to the space of all smooth functions on V) 
a smooth manifold called the maximal parameter manifold. 



Definition 7. Suppose at (x, a) there is a local parameterization 4> : U ^ R" x 
R^ of Xm with U C R" x {0}. The point (x, ct) is a maximal parameter ridge 
point of f if and only if the point :s. is a maximal convexity ridge point of f o 4>. 

A special case of this definition is the maximal scale ridge, which enjoys the 
subdimensional maxima property. It is not clear, however, that the maximal pa- 
rameter ridge has this property. Moreover, there are instances (see [7]) where 
maximal parameter values determine geometrically important spatial subspaces 
that are not necessarily eigenspaces of the Hessian oi f o (j). When this is the 
case, the ridge that is natural in that context is not compatible with Defini- 
tion 7. However we choose to define this new class of ridge, the definition must 
imply that the ridge has the subdimensional maxima property. What follows is 
a definition that allows for such distinguished spatial subspaces and a proof that 
the ridges so definition do in fact enjoy the subdimensional maxima property. 

Let U' C R” be open, U = U' x Rp, and /:[/—>■ R be smooth. Define Xm 
the maximal parameter manifold as above. Let (x, cr) G Xm at which a local 
parameterization (j) of Xm from U' exists. Finally, our idea for ridge definition 
must specify a subspace W{x) C R" (e.^.jin the case of the maximal scale ridge 
IT(x) was a one dimensional eigenspace of H{f o (j>)). 

Definition 8. The point (x, a) is a generalized maximal parameter ridge point 
of f with respect to W (x) if and only if f o (j) is has the subdimensional maxima 
property on W{x). 
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For (x, (t) to be a true ridge point, / must have the subdimensional maxima 
property on VF(x) x W. It is not immediately clear from Definition 8 that this 
is the case. We conclude this paper with its proof. 

Theorem 9. Every point on the generalized maximal parameter ridge of f as 
defined above is a subdimensional maxima of f with respect to the subspace 
IF(x) X RP. 



Proof. Let H{f\{W x Rp)) be defined as follows: 

(( \ ( dV \\ 

\dwidwj J \dwidpj J 

( d^f \ (J1L\ 

^ V dpi dwj J V 9pi dpj J ^ 



d\ 
[d h^J 



( 3 ) 



Because V(/|Rp) vanishes on Xm, and because V((/o</>)|IF) is defined to vanish 
on optimal parameter ridge points, V(/|(IF x Rp)) also vanishes on optimal 
parameter ridge points. 

Let w be defined as au + fiv, u G W and v G Rp. Then 



w* H{f\{W X R"))m = au*H^au + avfDjdv + (3v*Dau + jdv^Hpjdv 

And because H{{f o <f))\W) is negative definite, 

w^H{f\{W X R"))u; < — H{{f o (j))\W))au + 

avfDfdv + fdv^Dau + fiv^'Hpfiv 

The definition of H{{f o allows the following substitution: 

w*' H{f\{W X 'MP))w < au* DHpDau + 

au*D(3v + [3v*Dau + fiv^Hpfiv 

Finally, the introduction of H~^Hp and an algebraic rearrangement of terms 
yields 

w*H{f\{W X R"))m < {au*D + fiv* Hp)* H~^{au* D + fv*Hp) 

and, since is negative definite everywhere on Xm, H{f\{W x Rp)) is also 
negative definite. Therefore, f\{W x R”) is locally maximal at (x, a). This com- 
pletes the proof. 
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Abstract. In this paper we investigate scale space based structural 
grouping in images. Our strategy is to detect (relative) critical point 
sets in scale space, which we consider as an extended image representa- 
tion. In this way the multi-scale behavior of the original image structures 
is taken into account and automatic scale space grouping and scale se- 
lection is possible. We review a constructive and efficient topologically 
based method to detect the (relative) critical points. The method is pre- 
sented for arbitrary dimensions. Relative critical point sets in a Hessian 
vector frame provide us with a generalization of height ridges. Auto- 
matic scale selection is accomplished by a proper reparameterization of 
the scale axis. As the relative critical sets are in general connected sub- 
manifolds, it provides a robust method for perceptual grouping with only 
local measurements. 

Key words: deep structure, feature detection, scale selection, perceptual 
grouping. 



1 Introduction 

The goal in this paper is to perform scale space based structural grouping in 
images. We accomplish this by detection of the maximal response in scale space 
of the desired image structures. Our strategy is to detect (relative) critical point 
sets in scale space. Rather than to investigate the evolution of the critical sets 
of the original U-dimensional image across scale, we consider the scale space 
as an extended image representation and detect the critical sets in this U -|- 1- 
dimensional image. In this way the multi-scale behavior of the original image 
structures is taken into account and automatic scale space grouping and scale 
selection is possible. 

Critical points and relative critical point sets play an essential role in uncom- 
mitted image analysis as described in [11,12,13]. These topological structures 
are studied in the context of multi-scale image analysis in [6, 15, 16]. They form 
a topological “back-bone” on which the image structures are mounted. 

We introduce a non-perturbative method for detecting critical and relative 
critical points. The method is based on computing a surface integral of a func- 
tional of the gradient vector on the border of a closed neighborhood around 
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every image point [8,9]. This integral evaluates to zero for regular image points 
and to an integer number for critical points. The value and sign of this number 
discriminates between the different critical points. 

The main advantage of our method for localizing critical points lies in its 
explicit and constructive nature. To illustrate this, note that in finding the zero 
crossings of a real function, the only sensible task would be to find the intervals 
where the function changes sign. The size of these intervals is the precision with 
which we are searching for the zero crossings. Our topological construction is in 
many aspects analogous to this generic example. The size of the neighborhood 
(the closed surface) around the test image point is the spatial precision with 
which we want to localize the critical point. Therefore our method is a natural 
generalization of interval mathematics to higher dimensional signals. 

Another advantage of the method is its non-perturbative nature. To compute 
the integrals, we do not need to know the values of the gradient or higher order 
derivatives in the point, but only around the given image location as opposed 
to [3,6,5,2,13,16]. 

In the paper we first give a review on the detection of critical points and 
relative critical point sets as introduced by [10]. The method is based on the 
computation of homotopy class numbers. We show that detecting relative critical 
point sets in a Hessian frame provides us with a generalization of height ridges. 
We turn to the detection of critical point sets in scale space in Sect. 3. Because 
the properties of Gaussian scale spaces prohibit automatic scale selection, we 
deform the scale space with a form factor. We apply the method in Sect. 4 
for the grouping and detection of elongated structures at a scale of maximum 
response in some synthetical examples and in a medical fundus reflection image 
of the eye. In the last section we discuss some practical and conceptual issues 
concerning our approach. 



2 Critical Points and Relative Critical Points 

Critical points are those points in an image at which the gradient vanishes, 
i.e. extrema and (generalized) saddle points. We define a relative critical point 
as a critical point in a subdimensional neighborhood around an image pixel. 
The neighborhood can be defined in intrinsically or extrinsically subdimensional 
vector frames. In this section we show how to detect critical points in arbitrary 
dimensions. The detection of the relative critical points is then straightforward, 
because they are critical points themselves in a subdimensional vector frame. 

For the detection of the critical points we use the topological homotopy class 
numbers as introduced by [7,10]. This number reflects the behavior of the image 
gradient vector in a close neighborhood of an image point. For regular points it 
equals zero, whereas for critical points it has an integer value. In the simplest, 
one-dimensional, case it is defined as half the difference of the sign of the signal’s 
derivative taken from the right side and the left side of the point. For regular 
points the topological number equals zero, for local maxima —1 and for local 
minima -1-1. 
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Extension to higher dimensions can be done (borrowing from homotopy the- 
ory) by computing an _D — 1-dimensional form on a I? — 1-dimensional hypersur- 
face [10]. We will only give the main outline without elaborating on the theory 
of homotopy classes, see [14] for a detailed discussion on homotopy classes. For 
the introduction of homotopy class numbers in image analysis we refer to [10]. 



2.1 Review on Critical Point Detection 

The main construction lies in the definition of a topological quantity v which de- 
scribes the behavior of the image gradient vector around a point P of the image. 
Suppose Vp is a D-dimensional neighborhood around P which does not contain 
any critical points except possibly P and let dVp be the D — 1-dimensional closed 
oriented hypersurface which is the boundary of Vp. Because there are no critical 
points at the boundary of Vp, we can define the normalized gradient vector field 
of the image L at dVp 

( 1 ) 

y ^3^3 

U = d,L . 



Throughout the paper a sum over all repeated indices is assumed. Now we give 
the operational definition of the topological quantity v. The quantity v is a, 
surface integral of a D — 1 form over dVp. The form ^ is defined as, see [10], 

= , ( 2 ) 

where is Levi-Civita tensor of order D 

fQj. I 

0 ioT I = k 

= 1 . 

The topological integer number i> is now given as the natural integral of the form 
<P over dVp 

^ ^ / <?(Q) . (5) 

jQedVp 

The factor Ad is the area of a D-dimensional hypersphere of radius 1. The 
form (2) has the important property that it is a closed form [10], i.e. the total 
differential vanishes 




( 3 ) 

( 4 ) 



d<P = 0 . (6) 

This property is essential for the applications of the topological quantity (5) . If 
IE is a region where the image has no singularities, then the form (p is defined 
for the entire region and we can apply the generalized Stokes theorem [1,4] 



(p= d^ = 0 

dW Jw 



( 7 ) 
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because of (6). This has the important implication that the number v of (5) is 
zero at those points where L is regular. Furthermore, v is invariant under smooth 
deformations of the oriented hypersurface dW as long as no singularities are 
crossing the boundary. This property justifies the term “topological” assigned 
to V, since it depends on the properties of the image at the point P and not on 
the surface dW around P. The number v depends only on the number and type 
of singularities surrounded by dW . Therefore (5) defines the topological number 
V for the image point P, as long as the singularities are isolated. The last is 
always true for generic images [10]. We can compute (5) for every location in the 
image, obtaining an integer scalar density field v{x \, . . . , xd) that represents the 
distribution of the critical points of the image L. 

2.2 Detecting Critical Points in 1, 2 and 3 Dimensions 

As we discussed above, in the one-dimensional case the number v reduces to 

^ ^ (sign(L;,){, - sign(L 2 ,)a) , for a < x < 6 , (8) 

showing that is — 1 for maxima, -1-1 for minima and 0 for regular points. 

In two dimensions, the form <l> becomes 

, (9) 

which is just the angle between the normalized gradients in two neighboring 
points. Equation (5) becomes a closed contour integral which integrates this an- 
gle and we find the winding number associated with this contour. Therefore, 
iy{x,y) equals -1-1 in maxima and minima, —1 in non-degenerate saddle points, 
and 0 in regular points. For degenerate saddle points, or so-called monkey sad- 
dles, jz is — n -|- 1 where n is the number of ridges or valleys converging to the 
point. 

In three dimensions the form becomes 

<P = ^,d^j A dae*^'^ = m^jdm^kdx^ A dx^^e^'^ , (10) 

where we used d^j = dj^idx^ . In the appendix we give the form in Cartesian 
coordinates for a surface at which z is constant. It is possible to give a geometrical 
interpretation in three dimensions as in one and two dimensions. The form in (10) 
is the solid angle determined by normalized gradients in three neighboring points. 
Integrating this solid angle over a closed surface around an image points defines 
the number v. It is 1 for minima and one type of saddle points, —1 for maxima 
and another type of saddle points and 0 for regular points. The discussion on 
the saddle points is deferred to Sect. 2.4. 

2.3 Detecting Relative Critical Points 

For detecting relative critical points we project the image gradient vector to a 
local subdimensional vector frame. Let h]^{x) be a local subdimensional vector 
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frame of dimension -Dcp ^ D. Roman indices run from 1 to H, Greek indices 
from 1 to Dcp. The gradient in this frame is the projection of Li to it 

La{x) = hl^{x)Li{x), a = 1, . . . , I?CP • (11) 

For the detection of the relative critical points we use (5) and (2) but with 
the projected — normalized — gradient vector ^a, L>cp replacing D and Greek 
indices replacing Roman ones. 

One can show that relative critical points, detected in a frame of subdi- 
mension Dcp, belong to a set which is locally isomorphic to a linear space of 
dimension Ccp = D — Dqp . A proof can be found in [8, 9] . Note that Ccp is the 
codimension of the dimension in which the relative critical point is detected. 

If, e.g. in three dimensions, Dqp = 1 the local vector frame is of dimension 
1 and detection of the critical points reduces to (8), taking half the difference of 
sign of the gradient in the direction of h\ . The critical points form a manifold of 
dimension 2, i.e. a surface. For Dcp = 2, we compute the winding number (9) 
in the plane spanned by h\ and h\. These winding numbers form manifolds of 
dimension 1, which are strings. For Dcp = 3 we obtain (10), i.e. (5) in a full 
I?-dimensional neighborhood of the test point. In this case the manifold reduces 
to a point. 



2.4 Detecting Relative Critical Sets in a Hessian Ftame 

So far we have made no choice for the vector frame in which to detect the 
relative critical sets. In this section we take frames formed by eigenvectors h'‘{x) 
of the local Hessian field Hij{x) = didjL(x). The eigenvectors of the Hessian are 
aligned with the principal directions in the image and the eigenvalues Ai measure 
the local curvature^. From now on we assume that the eigenvectors are labeled 
in decreasing order of the magnitude of the curvature, i.e. |Ai| > • • • > |A_d|. We 
take as subdimensional frames the first Dcp eigenvectors of the Hessian field. 
With this choice we can interpret the relative critical sets as a generalization of 
height ridges. In fact, if there are m+ positive and m_ negative eigenvalues, we 
define a topological ridge set i?™+’™-(L) of codimension Ccp = m+ + m_ as 
a relative critical set associated with the first Dqp eigenvectors corresponding 
to the largest absolute eigenvalues of the Hessian field. We exclude the points 
at which the Hessian is degenerate. The obtained ridge set contains only those 
points at which there are exactly m_|_ positive and m_ negative eigenvalues. Note 
that there is a close relationship between our ridge set definition and the one by 
Damon [2]. In [2] the eigenvalues are ordered by their signed value and the sets 
i?™+’0 and coincide for both definitions. For mixed signatures the two 

definitions will delineate different topological sets. 

The number and signs of the eigenvalues put a natural label on the ridge 
sets. If all Dqp eigenvalues are negative we obtain a height ridge whereas for 
all eigenvalues positive we get a valley. In the general case where both m+ yf 0 



^ These curvatures are to be distinguished from the isophote curvature. 
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and rri- 0 we can speak of “saddle” ridges. The definition extends to the case 
Dcp = D when the ridge is of dimension zero. If all D eigenvalues are negative 
we are dealing with a maximum and if they are all positive with a minimum. 
For mixed numbers of signs we get saddle points of different signature. 

In the three-dimensional case, as discussed at the end of Sect. 2.2, we have 
four different critical points, which using (10) and (5) can only be divided in two 
groups. With the use of the ridge sets we can differentiate between all 

four signatures. This shows clearly that the choice of the Hessian field allows in 
the form of the ridge sets for a richer description of the relative critical 

sets than (5) does. In Table 1 we give an overview of all relative critical sets 
which can be detected in three dimensions using the Hessian frame. 

The topological ridge sets have a few other properties we like to discuss. First, 
as a direct consequence of their definition we can infer the following inclusion 
relation 



This relation shows that ridge sets can contain lower dimensional ridges as sub- 
sets. For example, a maximum can be included in a positive string or a positive 
surface. In Sect. 4 this property will show to be important in the detection of 
elongated structures at a scale of maximum response while establishing a link 
to the finest scale simultaneously. 

As a second property, we like to remark that one can prove that topological 
ridge sets are locally orthogonal to the Hessian vector frame see [8]. 



Table 1. Classification of the relative critical sets that can be detected in three di- 
mensions using a Hessian frame. The value of m+ -I- m_ determines the codimension 
of the detected set. The cases in which m+ -I- m_ > D are marked with a ‘ — There 
are two types of saddle points which can be found. 



m+ 

171 - 


0 


1 


2 3 


0 


regular 


negative surface 


negative string minimum 


1 


positive surface 


saddle string 


saddle point (1) — 


2 


positive string 


saddle point (2) 


- - 


3 


maximum 


— 


— — 



3 Relative Critical Sets in Scale Space 

In Sect. 2 we have shown how to detect relative critical sets for images of arbitrary 
dimensions. Our aim here is to detect objects and structures in scale space at 
a scale of maximum response. In this section we will focus on the detection of 
relative critical sets in linear Gaussian scale spaces of two-dimensional images. 
Note that, like [16], we look for the critical sets in scale space and not for the 
evolution across scale of the critical sets of the image itself [13]. Therefore we will 
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regard the scale space of a two-dimensional image L(x, y) as a three-dimensional 
entity, i.e. an image which depends on the three coordinates (x, y, a) 



L{x,y,a) 



1 

27r<T^ 



L{x' , y') exp 



(x - x'f + {y- y'Y 

2(j2 



dx'dy' . 



(13) 



In doing so, we must take into account that in its present form (13) Gaussian scale 
space is not suited for the grouping of image structures by detecting critical sets. 
For the detection of an elongated structure at its scale of maximum response, 
e.g., the strings of Table 1 seem the obvious choice. However, the sets that can be 
found are restricted by the properties of the Gaussian filters. For example, it is 
well known that there are no local extrema in scale space. Indeed, ii Lx = Ly = Q 
and if Hxx and Hyy are of the same sign, which is required for an extremum, we 
always have for non-umbilic points that dg-L = a{Hxx + Hyy) ^ 0. 

Extremal points can still be detected in scale space if we modify the Gaussian 
kernels by multiplying them with a monotonically increasing form factor 
to obtain a deformed scale space of the image 

L{x, a) ^ (j){a)L{x, a) = L{x, a) . (14) 



The factor <f> carries the essential information of the desired model structures 
we want to localize in the scale space. Note that the locations of the critical 
points {Li = 0, t G {x, y}) and therefore the locations of the catastrophes do not 
change. For appropriate choices of the form factor <j) it will be possible to make 
Lc equal to zero and to find extrema in the deformed image. 

As an example, let us take in a Z?-dimensional image a Gcp-dimensional 
Gaussian ridge which is aligned with the first Dcp = D — Ccp coordinate axes 

L{x,0) = ^Y^27rcrg^ o:=l,...,Dcp ■ (15) 

The scale space representation of (15) reads, 

L{x,a) = [J2 tt{u^ + al'^ exp (^~ 2 (^ 2 "^ ^ 2 ) j ■ (16) 

If we take the form factor 4>{a) = cr’’', similar to [13], the derivatives Lx^ and L„ 
are zero for 



Xa=d and a = . — ctq , (17) 

V L>cp — 7 

which defines a Ccp-dimensional surface in scale space. Note that D refers here to 
the image dimensions. The scale space has a dimension of D -I- 1. Equation (17) 
shows that only for 7 in the range 0 < 7 < Dqp an extremum in the scale 
direction can be generated. For 7 0 the extremum in the scale direction goes 

to zero whereas for 7 f Dqp the extremum moves to infinity. 

In general, the valid range for 7 > 0 will be determined by the profile of the 
ridge and a choice of 7 which is close to zero seems reasonable. 
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4 Examples 



In this section we give several examples of the constructions made in the previ- 
ous sections. Our main focus is on detection of elongated structures at a scale 
of maximum response with respect to a form factor (jj. But we first show the 
difference between the non-deformed (13) and a deformed (14) scale space of 
an anisotropic Gaussian, L{x,y) = {2Trax<Jy)~^ exp{—{x/'/2ax )^ - {y/'/^CTy)'^), 
which we regard as a two-dimensional elongated, ridge like, structure. For both 
scale spaces we detect all detectable sets. Figure 1 shows in the left frame the 
image, with ax = 2.0 and ay = 30.0 pixels, in the middle frame the detected 
sets of the non-deformed scale space and in the right frame the detected sets of 
the deformed scale space. At the bottom of the boxes we have shown the left 
frame again. The light grey surface in the middle and left frame is the ridge 
set R^’^, i.e. = 0 and m_ = 1. The dark grey string in the middle frame 
is the ridge set and represents the scale evolution of the maximum of the 
anisotropic Gaussian. The string is a subset of the surface which is in corre- 
spondence with (12). In the right frame we have detected more strings and 
a maximum that is depicted in white. Some straightforward calculation shows 
that the maximum is found at 



\ 



(7 - 1)(ct^ + + l/(7 - 1)^(<7^ - f7-2)2 -h 4 ct2ct. 



2(2 - 7 ) 



(18) 



The 7 value used was 0.5. The scale at which the maximum is found is u = 1.98, 
which is in agreement with (18). 

The inclusion relation (12) between the relative critical point sets can be used 
to establish a link from the detected scale space structure at its optimal scale 
and the location of this structure at the original scale. In the example above the 
connection is provided by the vertical string in the right frame of Fig. 1. 

For the examples in the rest of this paper we consider only deformed scale 
spaces. 

The horizontal string of the right frame of Fig. 1 can be used to detect 
elongated structures in an image at their scale of maximum response with respect 
to the form factor (j){a). We can optimize the intrinsically defined Hessian vector 
frame to reduce the response of the vertical strings, which represent the evolution 
of extrema in scale space. Since the elongated structures are one-dimensional 
structures in the original image and we want them to be detected at a scale 
of maximum response in scale space, we define the vector frame as follows: one 
vector is always pointing in the direction of increasing scale whereas the other is 
the eigenvector belonging to the largest curvature of the two-dimensional Hessian 
= L ij{x,y,a), {i,j} £ {x,y}. We can use the signs of the eigenvalues to 
discriminate between saddle strings, maxima strings and minima strings if we 
take the value of the second derivative in the scale direction as the eigenvalue 
belonging to the vector (0, 0, 1). 

In the next example we apply the above defined vector frame for the detec- 
tion of two perpendicular Gaussian ridges with scales cti = 2.0 and a 2 = 5.0 
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respectively. Figure 2 depicts in the left frame the image and in the right frame 
the strings with respect to the modified Hessian frame. We will refer to these 
strings as scale space strings. The 7 value used here is 0.5 and the maximum 
responses are found at cr = 2.0 and a = 5.0 in correspondence with (17). The 
strings are broken in the middle, which is due to interference of the two ridges. 
In this region there is no well defined ridge, except at scale cr = 3.8 where we 
find a string of small length. 

Now we consider the image in the top left frame of Fig. 3, which consists of 
a sequence of identical horizontal bars equally spaced along the vertical axis. In 
the top right frame we show the strings and observe that the bars give maxi- 
mal response at two distinct scales. At the fine scales the bars are detected as 
elongated structures separately, but at larger scales they have grouped together 
to one elongated structure in the perpendicular direction. The objects group 
themselves to different structures at different scales. In the bottom left frame 
the ridges change their orientation in a continuous way. We included a magnifi- 
cation of the ridges in the bottom right frame. In accordance with (12) we see 
that the strings always lie on a higher dimensional ridge (vertical surface) which 
provides the connection to the original scale. 

As a final example we detect the vessel structure of a retina in a two- 
dimensional fundus reflection image from a scanning laser ophthalmoscope, see 
the left frame of Fig. 4. In this example we used the form factor (j){a) = 
{cr lip + CTo))^. The positive strings are depicted in the right frame. 

All these examples lead us to the observation that the deformed scale spaces 
can serve as a grouping mechanism. 




Fig. 1. Example 1: The left frame shows an image of an anisotropic Gaussian blob 
with Ox = 2.0 and Oy = 30.0 pixels. Middle and right frames show the detected ridge 
sets for a non-deformed and a deformed scale space respectively. The scale runs from 
1.0 pixel from the bottom of the box to 4.0 pixels at the top. Light grey corresponds 
to the sets R^’^, dark grey to and white to R^’^. 
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Fig. 2. Example 2: Scale space strings of two Gaussian ridges. 





Fig. 3. Example 3: Scale space grouping of bars. The top left frame shows the original 
image. In the top right frame the scale space strings are depicted. In the bottom left 
frame we show the ridge sets . The bottom right frame is a magnification of the 
bottom left frame. We used 7 = 0.25. 




Fig. 4. Example 4: In the left frame a fundus reflection image of the eye is depicted. 
The interlacing artefacts are due to the method of the acquisition. The right frame 
shows the detected scale space strings. We used </>(o') = (o'/(cr + rro))"’^, no = 1, 7 = 0.5. 
Scale runs exponentially from 1.0 pixel to 4.0 pixels in 32 steps. 
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5 Discussion 

In the present paper we reviewed a constructive definition of relative critical sets 
in images of any number of spatial dimensions. The definition is very flexible 
because it associates critical sets to an arbitrarily chosen local vector frame 
field. Depending on the visual task, different model structures can be identified 
with the relative critical sets. As a consequence our construction can be purely 
intrinsic (defined only by the image structures), or it can involve externally 
specified frames. We demonstrated that the intrinsic Hessian frame leads to the 
detection of ridge sets. As an externally specified frame we defined a modified 
Hessian frame for the detection of elongated structures at their scale of max;imum 
response with respect to a multiplication form factor. This factor together with 
a selected vector frame contains the model information we want to incorporate 
in our detection strategy. 

The relative critical sets are in general connected sub-manifolds. Therefore, 
our technique provides indeed a method for perceptual grouping achieved with 
only local measurements. In a sense such a technique can be viewed as a partic- 
ular generalization of the threshold techniques where the connected entities are 
the level surfaces (or lines in 2D). 

All examples showed the grouping properties of the system of ridges for op- 
timal scale selection of multi-scale connected objects. The method also provides 
a linkage from the scale space structure down to the original image space. We 
observe that scale space performs a grouping mechanism in itself. We refer to 
those applications as to topological deep structure analysis. 

A Appendix 

Here we give an expression in Cartesian coordinates for the form (2) in three 
dimensions. As shown in (10), in three dimensions (2) reads 

<P = Mjdm^kdx^ . (19) 

Performing the contraction on I and m gives 

<p = - dy^jdxik)dx A dy 

+ {dy^jdz^k - dz^jdy^k)dy A dz (20) 

+ {dz^jdxik - dx^jdzik)dz A dx) . 

On a surface z is constant (20) reduces to 

= sd'^i,{dx^,dyik - dyi,dx^k)dx A dy . (21) 

Performing the contraction on j and k gives 

= 2{Udx^ydy^^ - dy^ydx^z) 

+ ^yidx^zdy^x - dy^zdx^x) ( 22 ) 

Similar relations hold for the other surfaces. 
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Abstract. This paper shows how the performance of feature trackers 
can be improved by building a view-based object representation consist- 
ing of qualitative relations between image structures at different scales. 
The idea is to track all image features individually, and to use the qual- 
itative feature relations for resolving ambiguous matches and for intro- 
ducing feature hypotheses whenever image features are mismatched or 
lost. Compared to more traditional work on view-based object tracking, 
this methodology has the ability to handle semi-rigid objects and par- 
tial occlusions. Compared to trackers based on three-dimensional object 
models, this approach is much simpler and of a more generic nature. A 
hands-on example is presented showing how an integrated application 
system can be constructed from conceptually very simple operations. 



1 Introduction 

To maintain a stable representation of a dynamic world, it is necessary to relate 
image data from different time moments. When analysing image sequences frame 
by frame, as is commonly done in computer vision applications, it is therefore 
useful to include an explicit tracking mechanisms into the vision system. 

When constructing such a tracking mechanism, there is a large freedom in 
design, concerning how much a priori information should be included into and 
be used by the tracker. If the goal is to track a single object of known shape, 
then it may be natural to build a three-dimensional object model, and to re- 
late computed views of this internal model to the image data that occur. An 
alternative approach is store a large number of actual views in a database, and 
subsequently match these to the image sequence. 

Depending on what type of object representation we choose, we can expect 
different trade-offs between the complexity of constructing the object representa- 
tion and the complexity in matching the object representation to image data. In 
particular, different design strategies will imply different amounts of additional 
work when the database is extended with new objects. 

The subject of this article is to advocate the use of qualitative multi-scale 
object models in this context, as opposed to more detailed models. The idea is 
to represent only dominant image features of the object, and relations between 
those that are reasonably stable under view variations. In this way, a new ob- 
ject model can be constructed with only minor additional work, and it will be 
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demonstrated that such a weaker approach to object representation is powerful 
enough to give a significant improvement in the robustness of feature trackers. 
Specifically, we will show how an integrated non-trivial application to human- 
computer interaction can be constructed in a straightforward and conceptually 
very simple way, by combination with a set of elementary scale-space operations. 

2 Choice of Image Representation for Featnre Tracking 

The framework we consider is one in which image features are detected at 
multiple scales. Each feature is associated with a region in space as well as a 
range of scales, and relations between features at different scales impose hi- 
erarchical links across scales. Specifically, we assume that the image features 
are detected with a mechanism for automatic scale selection. In earlier work 
(Bretzner & Lindeberg 1998o), we have demonstrated how such a scale selection 
mechanism is essential to obtain a robust behaviour of the feature tracker if the 
image features undergo large size variations in the image domain. 

The rationale for using a hierarchical multi-scale image representation for fea- 
ture tracking originates from the well-known fact that real-world objects consist 
of different types of structures at different scales. An internal object representa- 
tion should reflect this fact. One aspect of this, which we shall make particular 
use of, is that certain hierarchical relations over scales tend to remain reasonably 
stable when the viewing conditions are varied. Thus, even if some features are 
lost during tracking (e.g. due to occlusions, illumination variations, or spurious 
errors by the feature detector or the feature matching algorithm), it is rather 
likely that a sufficient number of image features remain to support the tracking of 
the other features. Thereby, the feature tracker will have higher robustness with 
respect to occlusions, viewing variations and spurious errors in the lower-level 
modules. As we shall see, the qualitative nature of these feature relations will 
also make it possible to handle semi-rigid objects within the same framework. 

In this way, the approach we will propose is closely related to the notion 
of object representation. Compared to the more traditional problem of object 
recognition, however, the requirements are different, since the primary goal is to 
maintain a stable image representation over time, and we do not need to support 
indexing and recognition functionalities into large databases. For these reasons, 
a qualitative image representation can be sufficient in many cases, and offer a 
higher flexibility by being more generic than detailed object models. 

Related works. This topic of this paper touches on both the subjects of fea- 
ture tracking and object representation. The literature on tracking is large and 
impossible to review here. Hence, we focus on the most closely related works. 

Image representations involving linking across scales have been presented by 
several authors. (Crowley & Parker 1984, Crowley & Sanderson 1987) detected 
peaks and ridges in a pyramid representation. In retrospect, a main reason why 
stability problems were encountered is that the pyramids involved a rather coarse 
sampling in the scale direction. (Koenderink 1984) defined links across scales us- 
ing iso-intensity paths in scale-space, and this idea was made operational for med- 
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ical image segmentation by (Lifshitz & Pizer 1990) and (Vincken et al. 1997). 
(Lindeberg 1993) constructed a scale-space primal sketch, in which a morpho- 
logical support region was associated with each extremum point and paths of 
critical points over scales were computed delimited by bifurcations. (Olsen 1997) 
applied a similar approach to watershed minima in the gradient magnitude. 
(Griffin et al. 1992) developed a closely related approach based on maximum 
gradient paths, however, at a single scale. In the scale-space primal sketch, scale 
selection was performed, by maximizing measures of blob strength over scales, 
and significance was measured by the volumes that image structures occupy in 
scale-space, involving the stability over scales as a major component. A gener- 
alization of this scale selection idea to more general classes of image structures 
was presented in (Lindeberg 1994, Lindeberg 1998&, Lindeberg 1998o), by de- 
tecting scale-space maxima, z.e. points in scale-space at which normalized differ- 
ential measures of feature strength assume local maxima with respect to scale. 
(Pizer et al. 1994) and his co-workers have proposed closely related descriptors, 
focusing on multi-scale ridge representations for medical image analysis. Psy- 
chophysical results by (Burbeck & Pizer 1995) support the belief that such hi- 
erarchical multi-scale representations are relevant for object representation. 

With respect to the problem of object recognition, (Shokoufandeh et al. 1998) 
detect extrema in a wavelet transform in a way closely related to the detection of 
scale-space maxima, and define a graph structure from these image features. This 
graph structure is then matched to corresponding descriptors for other objects, 
based on topological and geometric similarity. In relation to the large number of 
works on model based tracking, there are similar aims between our approach and 
the following works: (Roller et al. 1993) used car models to support the track- 
ing of vehicles in long sequences with occlusions and illumination variations. 
(Smith & Brady 1995) defined clusters of coherently moving corner features as 
to support the tracking of cars in a qualitative manner. (Black & Jepson 1998&) 
constructed a view-based object representation using an eigenimage approach 
to compactly represent and support the tracking of an object seen from a 
large number of different views. The recently developed condensation algorithm 
(Isard & Blake 1998, Black & Jepson 1998o) is of particular interest, by ex- 
plicitly constructing statistical distributions to capture relations between im- 
age features. Concerning the specific application to qualitative hand tracking 
that will be addressed in this paper, more detailed hand models have been pre- 
sented by (Kuch & Huang 1995, Heap & Hogg 1996, Yasumuro et al. 1999). Re- 
lated graph-like representations for hand tracking and face tracking have been 
presented by (Triesch & von der Malsburg 1996, Mauerer & von der Malsburg 
1996). 

3 Image Features and Qualitative Feature Relations 

We are interested in representing objects which can give rise to a rich variety of 
image features of different types and at different scales. Generically, these image 
features can be (i) zero-dimensional (junctions), (ii) one-dimensional (edges and 
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ridges), or (iii) two-dimensional (blobs), and we assume that each image feature 
is associated with a region in space as well as a range of scales. 



3.1 Computation of Image Features 

When computing a hierarchical view-based object representation, one may at 
first desire to compute a detailed representation of the multi-scale image struc- 
ture, as done by the scale-space primal sketch or some of the closely related 
representations reviewed in section 2. Since we are interested in processing tem- 
poral image data, however, and the construction of such a representation from 
image data requires a rather large amount of computations, we shall here follow 
a computationally more efficient approach. 

We focus on image features expressed in terms of scale-space maxima, i.e. 
points in scale-space at which differential geometric entities assume local maxima 
with respect to space and scale. Formally, such points are defined by 

(V(P 

norm L{x; s)) = 0) A {ds {V norm L{x; s)) = 0) (1) 



where L{-; s) denotes the scale-space representation of the image / constructed 
by convolution with a Gaussian kernel (/(•; s) with scale parameter (variance) s 
and T>norm is a differential invariant normalized by the replacement of all spatial 
derivatives dxi by y-normalized derivatives = s^^'^dx^- 

Two examples of such differential descriptors, which we shall make particular 
use of here, include the normalized Laplacian (with 7=1) for blob detection 

norm^ ~ ^ ij-'xx + ^j/y) (2) 



and the square difference between the eigenvalues Lpp and Lqq of the Hessian 
matrix (with 7 = 3/4) for ridge detection 
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^—norm 
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\Lpp Lqq\ — S {{Lxx Lyy) -\- 4L ) 



( 3 ) 



see (Lindeberg 1998&, Lindeberg 1998a) for a more general description. A com- 
putationally very attractive property of this construction is that the scale-space 
maxima can be computed by architecturally very simple and computationally 
highly efficient operations involving: (i) scale-space smoothing, (ii) pointwise 
computation of differential invariants, and (iii) detection of local maxima. 

Furthermore, to simplify the geometric analysis of image features, we shall 
reduce the spatial representation of image descriptors to ellipses, by evaluating 
a second moment matrix 



M = 



IrjeTR'^ 



Ll 

LxLy 



^x^y 



giv, Sint) dr] 



( 4 ) 



at integration scale Sint proportional to the detection scale of the scale-space 
maximum (equation (1)). Thereby, each image feature will we represented by 
a point {x; s) in scale-space and a covariance matrix S describing the shape. 
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graphically illustrated by an ellipse. For one-dimensional features, the corre- 
sponding ellipses will be highly elongated, while for zero-dimensional and two- 
dimensional features, the ellipse descriptors of the second moment matrices will 
be rather circular. Attributes derived from the covariance matrix include its 
anisotropy derived from the ratio Xmaxl^min between its eigenvalues, and its 
orientation defined as the orientation of its main eigenvector. 

Figure 3 shows an example of such image descriptors computed from a grey- 
level image, after ranking on a significance measure defined as the magnitude of 
the response of the differential operator at the scale-space maximum. A trivial 
but nevertheless very useful effect of this ranking is that it substantially reduces 
the number of image features for further processing, thus improving the com- 
putational efficiency. In a more detailed representation of the multi-scale deep 
structure of a real-world image, it will often be the case that a large number of 
the image features and their hierarchical relations correspond to image structures 
that will be regarded as insignificant by later processing stages. 

3.2 Qualitative Feature Relations 

Between the abovementioned features, various types of relations can be defined 
in the image plane. Here, we consider the following types of qualitative relations: 

Spatial coincidence (inclusion) : We say that a region A at position xa and scale 
sa is in spatial coincidence relation to a region B at position xb and at a 
(coarser) scale sb > if 

{xA- xb)'^ Sg^{xA- X b) &[Di,D2] (5) 

where Di and D 2 are distance thresholds. By using a Mahalanobis distance 
measure, we introduce a directional preference which is highly useful for 
expressing spatial relations between elongated image features. While the 
special case Di = 0 corresponds to an inclusion relation, there are also cases 
where one may want to explicitly represent distant features, using D\ > 0 

Stability of scale relations: For two image feature at times tk and tk’, we assume 
that the ratio between their scales should be approximately the same. This 
is motivated by the requirement of scale invariance under zooming 

SAjtk) ^ SAjtk') 

SB{tk) SB{tk') 

To accept small variations due to changes in view direction and spurious vari- 
ations from the scale selection mechanism of the feature tracker, we measure 
relative distances in the scale direction and implement the operation by 
q~q' I log ^ I < log T, where T > 1 is a threshold in the scale direction. 

Directional relation (bearing) : For a feature A related to a one-dimensional fea- 
ture B, the angle is measured between the main eigenvector of Sb and the 
vector XA~ Xb from the center xb of i? to the center xa of A. 
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Trivially, these relations are invariant to translations and rotations in the image 
plane. The scale invariance of these relations follows from corresponding scale 
invariance properties of image descriptors computed from scale-space maxima 
— if the size of an image structure is scaled by a factor c in the image domain, 
then the corresponding scale levels are transformed by a factor c^. 

3.3 Qualitative Multi-Scale Feature Hierarchy 

Let us now consider a specific example with images of a hand. From our knowl- 
edge that a hand consists of five fingers, we construct a model consisting of: (i) 
the palm, (ii) the five fingers, (iii) a finger tip for each finger, (see figure 1). 

Each finger is in a spatial coincidence relation to the palm, as well as a 
directional relation. Moreover, each fingertip is in a spatial relationship to its 
finger, and satisfies a directional relation to this feature. In a similar manner, 
each finger is in a scale stability relation with respect to the palm, and each 
fingertip is in a corresponding scale stability relation relative to its finger. 

Such a representation will be referred to as a qualitative multi-scale feature 
hierarchy. Figure 2 shows the relations this representation is built from, using 
UML notation. An attractive property of this view-based object representation 
is that it only focuses on qualitative object features. There is no assumption of 
rigidity, only that the qualitative shape is preserved. 

The idea behind this construction is of course that the palm and the fingertips 
should give rise to blob responses (equation (2)) and that the fingers give rise 
to ridge responses (equation (3)). Figure 3 shows an example of how this model 
can be initialized and matched to image data with associated image descriptors. 

To exclude responses from the background, we have here required that all 
image features should correspond to bright blobs or bright ridges. Alternatively, 
one could define spatial inclusion relations with respect to other segmentation 
cues relative to the background, e.g. chromaticity or depth. 

Here, we have constructed the graph with feature relations manually, using 
qualitative knowledge about the shape of the object and its primitives. In a 
more general setting, however, one can also consider the learning of stable fea- 
ture relations in an actual setting, based on a (possibly richer) vocabulary of 
qualitative feature relations. The list of feature relations in section 3.2 should 
by no means be regarded as exhaustive. Additional feature relations can be in- 
troduced whenever motivated by their effectiveness in specific applications. For 
example, in several cases it is natural to introduce a richer set of inter-feature 
relations between the primitives that are the ancestors of a coarser scale features. 

4 Feature Tracking with Hierarchical Support 

One idea that we are going to make explicit use of in this paper is to let features at 
different scales support each other during feature tracking. If fine-scale features 
are lost, then the coarse scale features combined with the other fine-scale features 
should provide sufficient information so as to generate hypotheses for recapturing 
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Fig. 1. A qualitative multi-scale feature hierarchy constructed for a hand model. 




Fig. 2. Instance diagram for the feature hierarchy of a hand (figure 1). 




20 strongest blobs and ridges Initialized hand model All hand features captured 



Fig. 3. Illustration of the initialization stage of the object tracker. Once the coarse- 
scale feature is found (here the palm of the hand), the qualitative feature hierarchy 
guides the top-down search for the remaining features of the representation. (The left 
image shows the 20 most significant blob responses (in red) and ridge responses (in 
blue).) 
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the lost feature. Similarly, if a coarse scale feature is lost, e.g. due to occlusion 
or a too large three-dimensional rotation, then the fine-scale features should 
support the model based tracking. While this behaviour can be easily achieved 
with a three-dimensional object model, we are here interested in generic feature 
trackers which operate without detailed quantitative geometric information. 

Figure 4 gives an overview of the composed object tracking scheme. The fea- 
ture tracking module underlying this scheme is described in (Bretzner & Lindeberg 
1998 a), and consists of the evaluation of a multi-cue similarity measure involving 
patch correlation, and stability of scale descriptors and significance measures for 
image features detected according to section 3.1. 



Scheme for object tracking using qualitative feature hierarchies: 

Initialization: 

Find and match top-level feature using initial position and top-level parent- 
children constraints. 

Tracking: 

For each frame: 

For each feature in the hierarchy (top-down): 

Track image features (see separate description) 

If a feature is lost (or not found) 

If parent matched 

Find feature using parent position and parent-feature relation 
constraints 

else if child(ren) matched 

Find feature using child(ren) position and feature-children re- 
lation constraints. 

Parse feature hierarchy, verify relations and reject mismatches. 



Fig. 4. Overview of the scheme for object tracking with hierarchical support. 



4.1 Sample Application I — The 3-D Hand Mouse 

From the trajectories of image features, we can compute the motion of the 
hand, assuming that the hand is kept rigid. An application that we are par- 
ticularly interested in is to use such hand gestures for controlling other com- 
puterized equipment. Examples of applications include (i) interaction with vi- 
sualization systems and virtual environments, (ii) control of mechanical sys- 
tems and (iii) immaterial remote control functionality for consumer electronics 
(Lindeberg & Bretzner 1998). The mathematical foundation for this “3-D hand 
mouse” was presented in (Bretzner & Lindeberg 1998 &). Our previous experi- 
mental work, however, was done with image sequences where an individual fea- 
ture tracker with automatic scale selection (Bretzner & Lindeberg 1998 a) was 
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Steady-state model One feature disappears Feature recaptured 



Fig. 5. The proposed qualitative representation makes it possible to maintain tracking 
even if parts of the object are occluded. Later in the sequence, the occluded part (in 
this case the finger), can be captured again using the feature hierarchy. (Here, all image 
features are illustrated by red, while the feature trajectories are green.) 



Steady-state model Fine scale features occluded All features captured 




Fig. 6. Illustration of how the qualitative feature hierarchy makes it possible to main- 
tain object tracking under view variations. The images show how most finger features 
are lost due to occlusion when the hand turns, and how the qualitative feature hierarchy 
guides the search to find these features again. 



The behaviour of the qualitative feature hierarchy tracker under semi-rigid motion 




Fig. 7. Due to the qualitative nature of the feature relations, the proposed framework 
allows objects to be tracked under semi-rigid motion. 
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sufficient to obtain the extended feature trajectories needed for structure and 
motion computations. 

The qualitative feature hierarchy provides a useful tool for extending this 
functionality, by making the system less sensitive to spurious errors of individual 
feature tracking. Figures 5-6 show two examples of how the qualitative feature 
hierarchy support the recapturing of lost image features. Figure 7 demonstrates 
the ability of this view-based image representation to handle non-rigid motions. 

While the image representation underlying these computations is a view- 
based representation, it should be remarked that the step is not far to a three- 
dimensional object model. If the hand is kept rigid over a sufficiently large 
three-dimensional rotation, we can use the motion information in the feature 
trajectories of the fingers and the finger tips for computing the structure and 
the motion of the object (see (Bretzner & Lindeberg 19986) for details). 

4.2 Sample Application II — View-Based Face Model 

Figure 8 shows an example of how a qualitative feature hierarchy can support 
the tracking of blob features and ridge features extracted from images of a face. 
Again a main purpose is to recapture lost features after occlusions. 



Steady-state model Occlusion by rotation Features recaptured 




Fig. 8. Results of building a qualitative feature hierarchy for a face model consisting 
of blob features and ridge features at multiple scales and applying this representation 
to the tracking of facial features over time. 



5 Summary and Discussion 

We have presented a view-based image representation, called the qualitative 
multi-scale feature hierarchy, and shown how this representation can be used for 
improving the performance of a feature tracker, by defining search regions in 
which lost features can be detected again. 

Besides making explicit use of the hierarchical relations that are induced by 
different features in a multi-scale representation, the philosophy behind this ap- 
proach is to build an internal representation that supports the processing of those 
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image descriptors we can expect to extract from image data. This knowledge is 
represented in a qualitative manner, without need for constructing geometrically 
detailed object models. 

In relation to other graph-like object representations, the discriminative power 
of the qualitative feature hierarchy may of course be lower than for geometri- 
cally more accurate three-dimensional object models or more detailed view-based 
representations involving quantitative information. Therefore the qualitative fea- 
ture hierarchies may be less suitable for object recognition, but still enough for 
pre-segmentation of complex scenes, or as a complement to filling in missing 
information given partial information from other modules (here the individual 
feature trackers). Notably, the application of this concept does not suffer from 
similar complexity problems as approaches involving explicit graph matching. 

It should be pointed out that we do not claim that the proposed framework 
should be regarded as excluding more traditional object representations, such as 
three-dimensional object models or view-based representations. Rather different 
types of representations could be used in a complementary manner, exploiting 
their respective advantages. For example, in certain applications it is natural to 
complement the qualitative feature hierarchy with a view-based representation 
at the feature level, in order to enable more reliable verification of the image 
features. Moreover, regarding our application to the 3-D hand mouse, it is worth 
pointing out that the qualitative feature hierarchy is used as a major tool in a 
system for computing three-dimensional structure and motion, thus at the end 
deriving a quantitative three-dimensional object model from image data. 

The main advantages of the proposed approach are that it is very simple 
to implement in practice, and that it allows us to handle semi-rigid objects, 
occlusions, as well as variations in view direction and illumination conditions. 
Specifically, with respect to the topic of scale-space theory, we have demonstrated 
how an integrated computer vision application with non-trivial functionally can 
be constructed essentially just from the following components: (i) basic scale- 
space operations (see section 3.1), (ii) a straightforward graph representation, 
and (iii) a generic framework for multi- view geometry (described elsewhere). 
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Abstract. The method of curve evolution is a popular method for recovering 
shape boundaries. However isotropic metrics have always been used to induce 
the flow of the curve and potential steady states tend to be difficult to determine 
numerically, especially in noisy or low-contrast situations. Initial curves shrink 
past the steady state and soon vanish. In this paper, anisotropic metrics are 
considered which remedy the situation by taking the orientation of the feature 
gradient into account. The problem of shape recovery or segmentation is 
formulated as the problem of finding minimum cuts of a Riemannian manifold. 
Approximate methods, namely anisotropic geodesic flows and solution of an 
eigenvalue problem are discussed. 

1 Introduction 

In recent years, there has been extensive development of methods for shape 
recovery by curve evolution. These methods are gaining in popularity due to their 
potential for very fast implementation. A parametric form of it was developed by 
Katz, Witkin and Terzoupolous [6]. A geometrically intrinsic formulation of active 
contours was introduced by Caselles, Catte, Coll and Dibos in [2] and developed 
over the years by several authors [3,7,8,14]. A formulation based on curve evolution 
introduced by Sethian [13] is also in use where the flow velocity consists of a constant 
component and a component proportional to the curvature (see for example, [18]). 
The evolving curve in this case is stopped near the shape boundary or at least slowed 
by means of a stopping term. From a geometric perspective, the image domain may 
be viewed as a Riemannian manifold endowed with a metric defined by the image 
features. An initial curve flows towards a geodesic with normal velocity proportional 
to its geodesic curvature. Several techniques for fast implemetation of geodesic flows 
have been developed. The speed of the method is due to two essential factors. First, 
noise suppression and edge detection are done in a hierarchical fashion: the image is 
smoothed first and then the geodesic flow is calculated. This is in contrast to the flows 
defined by segmentation functionals in which noise suppression and edge detection are 
done simultaneously. The second reason for the speed-up is that the object boundaries 
are found by tracking one closed curve at a time and thus the computational effort 
can be focused on a small neighborhood of the evolving curve. 

Throughout the development of this approach, the metric used has always been an 
isotropic metric. In this paper, fully general anisofropic metrics are considered. One 
reason for developing such a generalization is that in noisy or low contrast situations, 
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the steady states for the isotropic flows are not robust and one has to resort to devices 
such as a stopping term. For instance, when the image gradient is large everywhere 
due to noise, curves with the same Euclidean length will have their Riemannian 
length approximately equal if the metric is isotropic, indicating reduced sensitivity 
of the method. In practice, the curves tend to continuously shrink and vanish. A 
way to improve this situation is to take into account the orientation of the gradient 
by considering anisotropic metrics. Another reason to consider anisotropic metrics 
comes from the impressive results obtained by Shi and Malik [17] who formulate the 
problem of shape recovery in natural scenes as a problem of frnding the minimum cut 
in a weighted graph. An ingredient essential for their method to work is the implied 
anisotropic metric. Finally, use of anisotropic metrics is implied in boundary detection 
by means of segmentation functionals. This connection is briefly reviewed in Section 
2. However its implementation is computationally expensive and it is worthwhile to 
formulate anisotripic curve evolution directly. 

2 Segmentation Functionals and Curve Evolution 

Consider the segmentation functional [11] 



(1) E{i 



■ C)-j jp 



m || dxidx2 + 



— 7| dxidx'2 + i^\C\ 



D\C 



D 



where D is the image domain, I is the image intensity, C is the segmenting 
curve, |C| its length and m is a piecewise smooth approximation of I. Let e = 
+ ||Vm|P denote the energy density. Then with u fixed, the gradient flow 
for C is given by the equation 



( 2 ) 



dC 




VK N 



where C now denotes the position vector of the segmenting curve, superscripts +, — 
denote the values on the two sides of C, N is the normal to C pointing towards the 
side of C marked + and k denotes the curvature. To see anisotropicity, look at the 
limiting case as ^ oo. Then C minimizes the limiting functional 



( 3 ) 





ds 



The metric is an anisotropic (non-Riemannian) Finsler metric. It is singular and non- 
deflnite, exhibiting space-like and time-like behaviors [11]. Existence of space-like 
geodesics is an open question. 

At the other extreme, as /u ^ 0 the behavior is governed by the isotropic Euclidean 
metric. The curve minimizes 



(4) 
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where Di’s are the segments of D and 4 is the average value of I in Di. The curve 
evolution is given by the equation 

r 



(5) 



dC 



The equation is similar to the one used in [18] where the first term is replaced by 
a constant. The advantage of using the segmentation functional is that it avoids the 
problem of choosing this constant. 

Another segmentation functional that leads to isotropic curve evolution is formu- 
lated using T^-norms [15]: 



(6) E(u,C) = J j ||VM||(ia;i(ia ;2 H J j \u — I\dxidx2 -\- J 



D\c 



J u 
1 + nJu 



-ds 



where Ju is the jump in u across C, that is, Ju = Im"*" — |- In order to implement 

the functional by gradient descent, curve C is replaced by a continuous function v, 
the edge -strength function: 

(7) Ep{u, v) = J J {//(I - i;)^||Vi/|| + iy\u - I\-\- |||Vi;||^ + ^}dxidx 2 

D 



The gradient descent equations for u and v are: 
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where curv(u) is the curvature of the level curves of u: 



(9) 



/ \ Xo 

curv[u) = — ^ 



'^Xi^X2X2 



The term in the brackets in the first equation prescribes the three components of 
the velocity with which the level curves of u move. The first term is the usual 
Euclidean curvature term except for the factor of (1 — v)"^, the second term is the 
advection induced by the edge-strength function v and the last term prescribes the 
constant component of the velocity. The sign is automatically chosen such that this 
component of velocity pushes the level curve towards the corresponding level curve 
of I. The implied metric is isotropic. 
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3 Anisotropic Geodesic Flows 



It is helpful to start with a slightly more general framework to derive the equation 
of anisotropic geodesic flow. Let M denote the image domain D when it is endowed 
with a Riemannian metric, g = {gij}- Let C be a curve dividing M into two disjoint 
submanifolds, Mi and M^. Following Cheeger [4], define 



( 10 ) 



h{C) 



L{C) 

min A(Mi) 

i 



where L(C) is the length of C and A(Mi) is the area of Mi, both being measured 
with respect to the metric on M . Then the problem of shape recovery may be viewed 
as the problem of finding the minimum cut of M by minimizing h(C). (Note the 
dependence of the minimum cut on the size and shape of the image domain due to 
the term in the denominator.) The gradient flow obtained by calculating the first 
variation of h(C) is given by the equation 

dC 

(11) _ = ± /,(C)]AT, 

where Kg is now the geodesic curvature and Ng is the normal defined by the metric; 
plus sign is to be used if the area bounded by the curve is smaller than its complement, 
minus otherwise. In the isotropic case with the metric equal to a scalar function 6 
times the identity metric, the relation between the geodesic curvature Kg and the 
Euclidean curvature k is given by the equation 



( 12 ) 
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Thus the geodesic curvature includes the advection term. The term h(C) is the 
component of the velocity which is constant along the curve and varies in magnitude 
as the curve moves. To implement the flow, the initial curve is embedded as a level 
curve of a function u and the evolution equation for u is derived such that its level 
curves move with velocity proportional to their geodesic curvature augmented by 
the component j3, constant along each level curve. If only the motion of the original 
curve C is of interest, we may assume that all the level curves have the same constant 
component /3 equal to h[C), updated continuously as C evolves. However, if motion 
of all the level curves is of interest, then the value of h for each level curve must 
be calculated, making the implementation considerably more difficult. In this paper, 
purely anisotropic geodesic flow is studied by setting /3 = 0. 

The functional for u may be derived using the coarea formula, taking care to define 
all the quantities involved in terms of the metric g. Let g~^ = {g^^} be the metric 
dual to g given by the inverse of the matrix {gij}. Let 

<X,Y >=^XiAijYj 
hi 



( 13 ) 
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be the binary form defined by a given matrix A and let 
(14) II^IU = ^<^,^> 

Then the functional for u may be obtained by the coarea formula and has the form 



(15) 






I3u \/det{g) 



where /3 is assumed to constant. Here, Vm is the Euclidean gradient vector {ux,}. 
(In fact, g~^Vu is the gradient vector VgU defined by the metric g and ||V^m||^ = 
1 1 Vm 1 1^-1.) The evolution equation for u has the form 



(16) 



where 




g ^ Vm 



/? 



(17) 



diVgiX) = (Xi^deiig)^ 



is the divergence operator defined with respect to g. The equations (16) and (17) are 
valid in arbitrary dimension. The first term in Equation (16) is the mean geodesic 
curvature of the level hypersurfaces of u. 

In dimension 2, the evolution equation (16) for u assumes a fairly simple form and 
is not much more difficult to implement than in the isotropic case. In dimension 2, 
after multiplying the right hand side of Equation (16) by a positive function, we get 



du 

(18) = curv(u)\\V u\ \ + 



Tl|V«llK-fl|V«llQ-/g||V«llKy^^ 

||VM||^(ieI(/T) 



where as before, curv(u) is the Euclidean curvature of the level curves of m, Vm is 
the Euclidean gradient of m, ||Vm|| is its Euclidean norm, and 



(19) 



K = det{g)g ^ 

T = J2^^KijdjU 

Q ij — A V U j V A ij ^ 



Comparison with the corresponding equation for the isotropic flow shows that 
anisotropy does not affect the second order curvature term, but the advection term 
is more finely tuned. To have the effect of anisotropy on the second order term in 
dimension 2, more general Finsler metrics must be considered [1]. 
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4 Approximation: Riemannian Drums 

As in the isotropic case, Equation (18) is hyperbolic in the direction normal to the 
level curves so that its solution is capable of developing shocks. A shock-capturing 
numerical method [12] must be used to implement the equation. An alternative is to 
convert the minimum-cut problem into an eigenvalue problem as suggested by the 
Cheeger inequality 

(20) A > i(^nun/i(C)) 

where A is the second smallest eigenvalue of the Laplace-Beltrami operator. There- 
fore, instead of minimizing h(C), consider minimizing the Rayleigh quotient 

( 21 ) - — ? , 

Ju^^det{g) 

D 

which is equivalent to solving the eigenvalue problem 
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where 
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-|- Am = 0, Neumann boundary conditions 
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is the Laplace-Beltrami operator. When g is the Euclidean metric, the operator reduces 
to the ordinary Laplacian and the eigenvalue problem describes the modes of vibration 
of an elastic membrane. When discretized, the eigenvalue problem takes the form 



(24) 



Hu = \Mu 



where u is now a vector, H is the “stiffness” matrix and M is the mass matrix. 
An important point to note is that Equation (22) does not involve /3, its approximate 
value is determined automatically by the Cheeger inequality. Another important point 
to note is that for the approximation to work, an anisotropic metric is essential. In 
dimension 2, if the metric is isotropic, the numerator in the Rayleigh quotient is 
independent of g and since we expect g to deviate substantially from the Euclidean 
metric only near the shape boundary, the denominator is insensitive to g as well. As 
a result, the eigenvalue problem reduces to essentially the Euclidean case. 

The eigenvalue problem (22) is an analytic version of the formulation proposed 
by Shi and Malik in the framework of graph theory, motivated by the principles of 
gestalt psychology. They regard the image as a weighted graph by viewing the pixels 
as vertices and assigning weights to the edges in proportion to the proximity of the 
corresponding vertices and similarity between the feature values at these vertices. The 
minimum cut of the graph is defrned in some normalized way. There is a standard way 




Riemannian Drums, Anisotropic Curve Evolution and Segmentation 135 

to define the Laplaeian of a graph from its adjaeency matrix [9,10] and approximate the 
minimum eut problem as the problem of determining the eigenveetor eorresponding 
to the seeond smallest eigenvalue of this Laplaeian. Sinee this eigenvalue is zero if 
the graph is diseonneeted, it is ealled the algebraie eonneetivity of the graph. For a 
more detailed eomparison between the graph-theoretie formulation and the formulation 
presented here, see [16]. Note that here too, anisotropieity is essential. Isotropieity 
would mean that all the edges have the same weight and hence the graph cannot carry 
any information about the image. 

The eigenvalue problem may be approximately solved by one of the special 
methods for large sparse matrices such as the Lanczos method [5]. Care must 
be taken to ensure that the matrix H is symmetric so that the Lanczos method 
is applicable. One of the ways to ensure this is to derive Equation (24) by 
discretizing the Rayleigh quotient (21) instead of Equation (22). Details of the 
Lanczos method may be found in [16]. The method is an efficient procedure 
to find the vector which minimizes the Rayleigh quotient over the vector space 
Km ={ mo , M~^Huo, {M~^H)'^uq, • • • , (M“^iT)™Mo}- Here, «o is a user-supplied 
initial vector and m is chosen so that satisfactory numerical convergence is obtained. 
In principle, the only requirement for the method to work is that the initial vector 
must have a component along the true eigenvector. However, the greater the number 
of higher eigenvectors significantly present in «o, the larger the value of m needed 
for the method to converge to the second eigenvector. Moreover, as m increases, it 
becomes harder and harder to orthogonalize the vector space Km as required by the 
Lanczos method. Therefore, the choice of the initial vector is a non-frivial problem. 



5. Anisotropic Metrics 



In dimension 2, the obvious starting point for intensity images is the matrix 



(25) 



vr 0 vr 



dir dir dird^r 
dird2r d^rd^r 



where r denotes the smoothing of the image by a Gaussian filter. There are two 
problems with the metric defined in this way. First of all, the metric is degenerate 
since the determinant is zero. This may be remedied by adding a small multiple 
of the identity matrix to the above matrix. (Shi and Malik solve this problem by 
exponentiating the metric.) The second objection is that the length of each level 
curve of r is just a constant multiple of its Euclidean length. Since we expect the 
object boundaries to coincide more or less with the level curves, evolving curves will 
shrink and vanish. A solution to this problem is to divide the augmented matrix by 
its determinant. The final result is the metric given by the matrix 
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where a is a constant. Finally, just as in the isotropic case [14], the metric may be 
raised to some power p. The effect ofp is to sharpen the maxima and the minima of the 
smaller eigenvalue of the metric over the image domain, resulting in sharper edges and 
comers. In the gradient direction, the infinitestimal length is the Euclidean arclength 
ds, independent of the gradient. Along the level curves of , the infinitestimal length 
is ds/{l + a| |V/*^ I Thus the metric provides a generalization compatible with 
the isotropic case as it is usually formulated [14]. Its generalization to vector valued 
images, for instance to the case where we have a set of transforms of the image 

by a bank of filters, is straightforward. In matrix (26), simply replace dil^djl^ by 
Generalizaton to arbitrary dimension n is obtained by letting 
the indices i,j run from 1 to n and normalizing the metric by dividing it by the 
{n — 1)*^ root of its determinant. Of course, determining the weights is a 

difficult problem. 



6. Experiments 

In the first experiment, different methods considered here are compared from the 
point of view of smoothing intensity images. In the second experiment, in addition 
to smoothing an MR image, anisotropic flow is applied to smoothing of the zero- 
crossings of the Laplacian of the image presmoothed by a Gaussian. 

In these experiments, the constant (3 was set equal to zero and a was chosen so 
that the smallest value achieved by the smaller eigenvalue of the matrices {gij } over 
the image domain was equal to a small constant c, less than 1. The closer the value 
of c is to 1, the closer the metric is to the Euclidean metric. (The Euclidean geodesic 
flow is a purely curvature-driven flow without advection. The image is eventually 
smoothed out to uniform intensity.) In the case of the eigenvalue problem, the closer 
the value of c is to 1, the more the behavior is like a Euclidean drum and the second 
eigenvector is dominated by the fundamental mode of Euclidean vibration. 

In order to clearly bring out the differences among the different methods, the first 
experiment is that of a synthetic image with greatly exaggerated noise. The image 
is shown in Figure la (top-left) and was created by adding noise to a white ellipse 
on a black background. The top-right frame shows a horizontal and a vertical cross- 
section through the middle of the image. The metric was calculated from the the 
filtered image obtained by filtering the original image by a Gaussian with standard 
deviation equal to \/2. 7*^ was also used as the initial vector for the geodesic flow as 
well as for the Lanczos iteration. (Uniformly sampled random noise was also fried as 
initial u for solving the eigenvalue problem, but the convergence was unacceptably 
slow.) . 

Under the isotropic flow, with 6* = 1/(1 -|- a||V7‘^|p)^ in Equation (12), all the 
significant level curves shrank and vanished in a few thousand iterations. The bottom 
frames of Figure la show the results of anisotropic geodesic flow. The numerically 
steady state shown in the figure remained stable even after a few hundred thousand 
iterations. Sharpening of the edges can be clearly seen in the graph of the cross- 
sections. 
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Fig. la. Top Right: A synthetic image . Top Left: Horizontal and 
vertical sections through the middle. Bottom Right: Smoothing 
by anisotropic geodesic flow. Bottom Left: The two sections. 



The solution to the eigenvalue problem is shown in the top-row of Figure lb. The 
figure shows that the method is not as effective as the method of geodesic flow for 
denoising or deblurring. In fact, the solution is very close to the initial vector . 

The best results were obtained using Equations (8) corresponding to the segmen- 
tation functional (6) as shown in the bottom row of Figure lb. The advantage of the 
segmentation functional over the curve evolution formulation is that denoising and 
edge detection are done simultaneously. The formulation makes it possible for the 
smoothed intensity u and the edge-strength function v to interact and reinforce each 
other. In the example shown, u is in fact almost piecewise constant. 

Figures 2a and 2b portray the results for an MR image. This is a more difficult 
image to deal with since the intensity gradient and the curvature vary widely along 
the object boundaries, resulting in varying degrees of smoothing. This is especially 
true of the thin protrusions and indentations. The top row in Figure 2a shows the 
original image togetherwith graphs of two horizontal cross-sections. The top graph is 
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Fig. lb. Top Right: Result of solving the eigenvalue problem. Top Left: The two 
sections. Bottom Right: Result by LI functional. Bottom Left: The two sections. 



a section near the top of the image while the bottom graph is through the two ventricles 
in the middle. The bottom row shows the effect of smoothing under anisotropic flow 
using the original image as the initial u as well as for calculating the metric. Figure 
2b shows the results of smoothing the zero-crossings of The case u = 1/2 

is shown in the top row, the left frame being the initial zero-crossings. The case 
(7 = 3 / \/2 is shown in the bottom row. The anisotropic metric was computed from 
the original (unsmoothed) image. Stability of the significant boundaries is indicated 
by the close similarity between the curves in the two figures. 
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Fig. 2a. Top Right: An MR image. Top Left: Two horizontal sections. Bottom 
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Abstract. In this paper, we propose a new model for active contours to 
detect objects in a given image, based on techniques of curve evolution, 
Mumford-Shah functional for segmentation and level sets. Our model 
can detect objects whose boundaries are not necessarily defined by gra- 
dient. The model is a combination between more classical active contour 
models using mean curvature motion techniques, and the Mumford-Shah 
model for segmentation. We minimize an energy which can be seen as a 
particular case of the so-called minimal partition problem. In the level set 
formulation, the problem becomes a “mean-curvature flow” -like evolving 
the active contour, which will stop on the desired boundary. However, 
the stopping term does not depend on the gradient of the image, as in 
the classical active contour models, but is instead related to a particular 
segmentation of the image. Finally, we will present various experimental 
results and in particular some examples for which the classical snakes 
methods based on the gradient are not applicable. 



1 Introduction 



The basic idea in active contour models or snakes is to evolve a curve, subject 
to constraints from a given image Uq, in order to detect objects in that image. 
For instance, starting with a curve around the object to be detected, the curve 
moves toward its interior normal under some constraints from the image, and 
has to stop on the boundary of the object. 

Let 17 be a bounded and open subset of with dH its boundary. Let uq be 
a given image, as a bounded function defined on 17 and with real values. Usually, 
17 is a rectangle in the plane and Uq takes values between 0 and 255. Denote by 
C{s) : [0, 1] — >■ a piecewise parameterized curve. 

In all the classical snakes and active contour models (see for instance [7], [3], 
[9], [4]), an edge detector is used to stop the evolving curve on the boundaries of 
the desired object. Usually, this is a positive and regular edge-function (/(|Vuo|), 
decreasing such that limj_>oo g(t) = 0. For instance, 



ff(|V-«o|) 



1 

1 + |VG,,*mo|2’ 



where Go- * uq is the convolution of the image uq with the Gaussian Ga-(x,y) = 
exp(— -I- j/^|/4ct) (a smoother version of mq). The function gdVuol) will 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 141—151, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




142 



T. Chan and L. Vese 



be strictly positive in homogeneous regions, and near zero on the edges. The 
evolving curve moves by a variant of the mean curvature motion [14] with the 
edge-function g(|VMo|) as an extra factor in the velocity. 

All these classical snakes or active contour models rely on this edge-function 
g, depending on the gradient |Vmo| of the image, to stop the curve evolution. 
Therefore, these models can detect only objects with edges defined by gradient. 
Also, in practice, the discrete gradients are bounded and then the stopping func- 
tion g is never zero on the edges, and the curve may pass through the boundary. 
On the other hand, if the image Uq is noisy, then the isotropic smoothing Gaus- 
sian has to be strong, which will smooth the edges too. In this paper, we propose 
a different active contour model, without a stopping edge- function, i.e. a model 
which is not based on the gradient of the image uq for the stopping process. 
The stopping term is based on Mumford-Shah segmentation techniques [13]. In 
this way, we obtain a model which can detect contours both with or without 
gradient, for instance objects with very smooth boundaries or even with discon- 
tinuous boundaries. For a discussion on different types of contours, we refer the 
reader to [6]. 

The outline of the paper is as follows. In the next section we introduce our 
model as an energy minimization and discuss the relationship with the Mumford- 
Shah functional for segmentation. In Section 3, we formulate the model in terms 
of level set functions, compute the associated Euler-Lagrange equations, and 
discuss the algorithm. We end the paper validating our model by numerical 
results. We show in particular how we can detect contours without gradient or 
cognitive contours [6], for which the classical models are not applicable, and also 
how we can automatically detect interior contours. 

Before describing our proposed model, we would like to refer the reader to 
the works [10] and [11] for shape recovery using level sets and edge-function, and 
to more recent and related works by [19], [17], and [8]. 

Finally, we would also like to mention [21] and [12] on shape reconstruction 
from unorganized points, and to the recent works [15] and [16], where a proba- 
bility based geodesic active region model combined with classical gradient based 
active contour techniques is proposed. 



2 Description of the model 

Let C be the evolving curve. We denote by ci and C 2 two constants, representing 
the averages of uq “inside” and “outside” the curve C. 

Our model is the minimization of an energy based-segmentation. Let us first 
explain the basic idea of the model in a simple case. Assume that the image 
Uq is formed by two regions of approximatively piece wise-const ant intensities, 
of distinct values Ug and Uq. Assume further that the object to be detected is 
represented by the region with the value Uq and let denote his boundary by C. 
Then we have uq « Uq inside the object (inside C) and uq « Ug outside the 
object (outside C). Now let us consider the following “fitting energy”, formed by 
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two terms: 

Fi{C) + F2{C) = [ \uo - cil"^ dxdy + f \uq - C2\'^dxdy, 

J inside{C) J outside{C) 

where C is any other variable curve. We say that the boundary of the object C 
is the minimizer of the fitting energy: 

inf {f’i(C') + « 0 « Fi(C) + F2{C). 

This can be seen easily. For instance, if the curve C is outside the object, then 
Fi{C) > 0 and ^2(6*) « 0. If the curve C is inside the object, then Fi{C) « 0 
but F2{C) > 0. Finally, the fitting energy will be minimized if the C = C, i.e. if 
the curve C is on the boundary of the object. These remarks are illustrated in 
Fig. 1. 



Fi{C)>0 
F2(C) «0 


Fi{C)>0 
F2{C) > 0 


Fi(C) «0 
F2{C) > 0 


Fi(C') «0 
F2{C) « 0 


% 


% 


% 


% 



Fig. 1. Consider all possible cases in the position of the curve. The “htting energy” is 
minimized only for the case when the curve is on the boundary of the object. 



Therefore, in our active contour model we will minimize this fitting energy 
and we can add some regularizing terms, like the length of C and/or the area 
inside C. We introduce the energy F(C, ci,C2) by: 

F{C, Cl, C2) = fJ, ■ length(C') + ly ■ dxed„{insideC) 

+ Ai / \uq — ci\^ dxdy + \2 / \un — 02]"^ dxdy , 

J inside{C) J outside{C) 

where ci and C2 are constant unknowns, and /i > 0 , jz > 0, Ai, A2 > 0 are fixed 
parameters. 

In almost all our computations, we take v = Q and Ai = A2. Of-course that 
one of these parameters can be “eliminated”, by fixing it to be 1. In almost all 
our computations, we take v = Q and Ai = A2. The area term in the energy can 
be used for instance when we may need to force the curve to move only inside. 

In order to balance the terms and their dimensions in the energy, if d is the 
unit distance in the 12— plane, then y, has to be measured in units of {size of uq)^- 
d, and iz has to be measured in units of {size of uq)'^. 
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Finally, we consider the minimization problem: 

inf F{C,ci,C 2 ). 

C,Ci,C2 



( 1 ) 



2.1 Relation with the Mumford-Shah functional for 
segmentation 

The Mumford-Shah functional for segmentation is [13]: 

F^^{u,C)= ( (ajVMp -I- /3 |m — -I- length(C), (2) 

Jn\c 

where a, [3 are positive parameters. The solution image u obtained by minimiz- 
ing this functional is formed by smooth regions Ri and with sharp boundaries, 
denoted here by C. 

A reduced form of this problem, as it was pointed out by D. Mumford and 
J. Shah in [13], is simply the restriction of F’'^^ to piecewise constant functions 
u, i.e. u = Ci with a constant, on each connected component Ri oi \ C. 
Therefore, the constants Ci are in fact the averages of Mq on each Ri . The reduced 
case is called the minimal partition problem. 

Our active contour model is a particular case of the minimal partition prob- 
lem, in which we look for the best approximation u of uq, as a function taking 
only two values, namely: 

_ J average(uo) inside C 
y average(uo) outside C, 

and with one edge C, represented by the snake or the active contour. 

This particular case of the minimal partition problem can be formulated and 
solved using the level set method [14]. This is presented in the next section. 



2.2 The level set formulation of the model 



In the level set method [14], an evolving curve C is represented by the zero level 
set of a Lipschitz continuous function (p : H ^ M. So, C = {{x,y) G f2 : 4>{x, y) = 
0}, and we choose </> to be positive inside C and negative outside C. For the level 
set formulation of our variational active contour model we essentially follow [20] . 
Therefore, we replace the unknown variable C by the unknown variable (p and 
the new energy, still denoted by F((p, c\, C 2 ), becomes: 



F{(p, Cl, C 2 ) = ^ • length{(/) = 0} + i' ■ area{</> > 0} 

+ Xi \uo — cifdxdy + X2 / \uo — C2fdxdy. 

J d )>0 J q !><0 



Using the Heaviside function FI defined by 



H{x) 



1, if X > 0 
0, if X < 0 
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and the one-dimensional Dirac measure S concentrated at 0 and defined by 

the sense of distributions), 
ax 

we express the terms in the energy F in the following way: 

r length{<)< = 0} = /^ \VHm = 

[ area{(/) >0} = H{(j>)dxdy, 

and 

[ f^>o l'“o “ ci\‘^dxdy = |uq - cil^H(</>)dxdy 

I 4<0 l'“o “ C 2 \'^dxdy = |uq - C 2 p(l - H{4>))dxdy. 

Then the energy -?’(</>, ci, C 2 ) can be written as: 



F((/>, ci,C 2 ) = /X f S{4>)\V4)\ + v f H{4>)dxdy 
J n J n 

+ Xi \uq — ci\‘^H{(j))dxdy + X 2 / \uq — C2\^{1 — H{(j)))dxdy. 
J n J n 



Keeping </> fixed and minimizing the energy F((j), ci, C 2 ) with respect to the 
constants ci and C 2 , it is easy to express these constants function of (p by: 



ci{(p) 

C2{(P) 



J^uoH{4>)dxdy 
J^H{(p{x,y))dxdy 
f^uojl - H{(j)))dxdy 
~ H{(j){x,y)))dxdy 



(the average of uq in {</> > 0}), 
(the average of uq in {</> < 0}). 



( 3 ) 

( 4 ) 



Keeping ci and C 2 fixed, and formally minimizing the energy with respect 
to (j>, we obtain the Euler-Lagrange equation for p (parameterizing the descent 
direction by an artificial time): 



^ = 6{(p) /idiv 
d(p 






|Vb| dn 



\^4>\J 

= 0 on df2. 



V - Ai(uo - ci)^ -I- A 2 (mo - 02)^ 



17, 



In practice, we have to consider slightly regularized versions of the functions 
FI and 5, denoted here by Fl^ and such that Sg{x) = H'^{x). 

A first possible regularization by and respectively functions, as pro- 
posed for instance in [20], is: 






1 if a; > £ 

0 if a; < — e 



1 -h r + 



1 




if |a;| < e 
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and 

f 0 if |a;| > e 

= I i ^ ^ jf 1^1 < 

In our calculations, we use instead the following C°° regularized versions of 
H and <5, defined by: 

1 2 X 1 e 

H 2 .e{x) = ;t( 1 + - arctan(-)), (52,e(x) = {x) = 

2 7T e ’ 7T 

As e — >■ 0, both approximations converge to H and 6. The first approxima- 
tions Hi ,; and <5i_£ are and respectively functions, with 5\^e with small 
compact support, arround the zero-level set. The second approximations i? 2 ,e 
and i 52,£ are both C°° functions, with <52, g different of zero everywhere. 

We want to formally explain here why we need to introduce the second ap- 
proximations, instead of the first approximations, which have been used in pre- 
vious papers (for instance in [20]). Because our energy is non-convex (allowing 
therefore many local minima), and because has a very small compact sup- 
port, the interval [— e, s], the iterative algorithm may depend on the initial curve, 
and will not necessarily compute a global minimizer. In some of our tests using 
the first approximation, we obtained only a local minimizer of the energy. Using 
the second approximations, the algorithm has the tendency to compute a global 
minimizer. One of the reasons is that, the Euler-Lagrange equation acts only 
locally, on a few level curves arround ^ = 0 using the first approximation, while 
by the second approximation, the equation acts on all level curves, of course 
stronger on the zero level curve, but not only locally. In this way, in practice, we 
can obtain a global minimizer, independently of the position of the initial curve. 
Moreover, interior contours are automatically detected. We could also extend 
the motion to all level sets of 4> replacing in the equation by \V(f>\ (this 
method is for instance used in [20]). 

To discretize the equation in <j), we use a finite differences implicit scheme 
(we refer the reader to [1], for details). 

We also need at each step to reinitialize 4> to be the signed distance function to 
its zero-level curve. This procedure is standard (see [18] and [20]), and prevences 
the level set function to become too flat, or it can be seen as a stability for 
4> and a rescaling. 

This reinitialization procedure is made by the following evolution equation 
[18]: 

( -ipr = sign{(l){t)) {I- iV-ipl) 

where (f>(t, •) is our solution (j) at time t. Then the new (j)(t, •) will be ip, such that 
Ip is obtained at the steady state of (5). 

3 Experimental results 

We present here numerical results using our model. For the examples in Fig- 
ures 2-5, we show the image and the evolving contour (top), together with the 
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piece wise-const ant approximations given by the averages ci and C 2 (bottom). 
In all cases, we start with a single initial closed curve. We choose the level set 
function 4> to be positive “inside” the initial curve, and negative “outside” the 
initial curve, but in our model this choice is not important. We could consider 
the opposite signs, and the curve would still be attracted by the object. Also, 
the position of the initial curve is not important. 

In Fig. 2 we show how our model can detect contours without gradient or 
cognitive contours (see [6]) and an interior contour automatically, starting with 
only one initial curve. This is obtained using our second approximations for H 
and 5. In Fig. 3 we consider a very noisy image. Again the interior contour of 
the torus is automatically detected. 

In Fig. 4 we validate our model on a very different problem: to detect features 
in spatial point processes in the presence of substantial cluster. One example is 
the detection of minefields using reconnaissance aircraft images that identify 
many objects that are not mines. These problems are for instance solved using 
statistical methods (see for instance [5] and [2]). By this application, we show 
again that our model can be used to detect objects or features with contours 
without gradient. This is not possible using classical snakes or active contours 
based on the gradient. 

We end the paper with results on two real images (Fig. 5 and 6.), illustrating 
all the properties of our model: detecting smooth boundaries, scaling role of the 
length term in the size of the detected objects, and automatic change of topology. 




Fig. 2. Detection of different objects in a synthetic image, with various convexities and 
with an interior contour, which is automatically detected. Here we illustrate the fact 
that our model can detect edges without gradient. Top: uq and the contour. Bottom: 
the piece- wise constant approximation of uo- 
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O’^ O'^ O’^ 
m n n 






m 



Fig. 3. Results for a very noisy image, with the initial curve not surrounding the 
objects. Top: uo and the contour. Bottom: the piece-wise constant approximation of 
Uo- 




Fig. 4. Detection of a simulated minefield, with contour without gradient. 
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Fig. 5. Detection of a galaxy with very smooth boundaries. 




Fig. 6. Detection of the contours of a galaxy. 



4 Concluding remarks 

In this paper we proposed an active contour model based on Mumford-Shah segmen- 
tation techniques and level set methods. Our model is not based on an edge-function, 
like in the classical active contour models, to stop the evolving curve on the desired 
boundary. We do not need to smooth the initial image, even if it is very noisy and in 
this way, the locations of boundaries are very well detected. Also, we can detect objects 
whose boundaries are not necessarily defined by gradient or with very smooth bound- 
aries. The model automatically detects interior contours, starting with only one initial 
curve. The initial curve does not necessarily start around the objects to be detected. 
Finally, we validated our model by various numerical results. 
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Abstract. Multiscale segmentation respectful of the visual perception 
is an important issue of Computer Vision. We present an image model 
derived from the level sets representation which offers most of the prop- 
erties sought to a good segmentation : the borders are located at the per- 
ceptual edges; they are invariant by affine map and by contrast change; 
they are sorted according to their perceptual significance using a scale 
parameter. At last, a compact version of this model has been developed 
to be used in a progressive, and artifact-free, image compression scheme. 



1 Introduction 

One of the basic problem of image analysis is to define a mathematical repre- 
sentation that offers suitable properties upon which subsequent computer vision 
algorithms would operate. 

The edge detection theory of Hildreth and Marr [19] [26] was one of the 
first attempts to solve this problem using a multiscale analysis. The raw primal 
sketch of D.Marr is based on the detection of the intensity changes in the image, 
by recording the zero-crossing location of the image filtered by the Laplacian 
of the Gaussian at a given scale. The edges are then defined as discontinuity 
lines, and the scale parameter allows to discriminate the important atoms. This 
approach has been successfully developed in the past, since it meets almost all 
the requirements of a “good primal sketch” (see e.g. [10] for an optimal edge de- 
tector) . These last years have seen interesting reformalizations of this multiscale 
edges representation, in a wavelet [18] and in a variational [21] framework. 

However, this approach still suffers of some drawbacks that do not make it 
always suitable for some processes : the representation is not invariant under 
contrast changes. This means that the edge locations of an image on which a 
contrast change has been applied differ from the original edge locations. It is 
well known since M. Wertheimer [25] that the visual perception of edges does 
not depend of the light level. Therefore, edges should not be computed using a 
discrete derivative. This deviation is a major inconvenience for pattern recogni- 
tion processes, but it does not really apply to compression problem. This last 
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domain is concerned by the completeness of the representation. Although the 
classical linear multiscale edges representation is mathematically not complete, 
algorithms which allow to reconstruct an image close to the original have been 
described [14] [18]. But when the representation is altered by a compression 
process, visual artifacts (e.g. Gibbs phenomena) appear on the reconstructed 
image. 

More recently, it has been proved in [2] that, under fairly conditions (includ- 
ing invariance under change of contrast, under change of scale and under affine 
map), there exists only one regular multiscale analysis, the so-called AMSS (for 
Affine Morphological Scale Space) . An image is decomposed into this scale-space 
using a parabolic evolution equation, for which viscosity solutions [9] exist. Be- 
cause of the morphological invariance, the evolution of the image along the scales 
is equivalent to the evolution of its level curves, which are defined as the border 
of the level sets. A level set is a set of pixels with gray levels below (or above) a 
given threshold. 

The representation of an image by its level sets has been proposed by the 
Mathematical Morphology school [22] as a geometrical decomposition which of- 
fers the contrast invariance. This representation can be viewed as another raw 
primal sketch, for which all good properties are met but the compactness. Re- 
cently, such a decomposition based on the connected components of the level 
lines has been described [20], together with a fast algorithm. This representation 
is well adapted to number of image analysis problems (as pattern matching) but 
it still suffers of a relatively large amount of data. Using AMSS, a simplifica- 
tion of the image can be performed to reduce the amount of data. However, the 
structure of the image is then considerably weakened and the filtered image does 
not sound natural. 

In this paper, we introduce another model based on the level sets that can 
be coded using a very small amount of data. We propose in Section 2 a new 
definition of morphological edges, based on the selection of the most perceptive 
level sets. We use the conjecture expressed in [6] about the atoms of the visual 
perception, made by pieces of level lines joining junctions. The regions formed by 
these level sets compose a segmentation at a given scale. Section 3 describes our 
image model : using an elliptic PDE [7], the image is approximated by smooth 
and non-oscillating functions on each regions given by the segmentation. This 
defines a sketch image which carries the most important structures, up to the 
scale of the less perceptive edges. According to the quantity of pixel’s values 
retained in the border’s regions, the image model may be used as a compact 
image representation. A straightforward application of this compact model is 
the design of a compression scheme that respects the human visual system. We 
illustrate this capacity in Section 4. 

2 Morphological edges and multiscale segmentation 

This section addresses the problem of computing a multiscale and morphological 
segmentation V = {Pi/i = 1, . . . ,n} such that the borders of the most impor- 
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tant structures match the borders of the regions Pi. A classical answer would be 
to try to extract the edges, in the sense of the discontinuity lines in the image. 
However, the use of classical edge detectors is not consistent with the morpho- 
logical approach, as it is well explained in [6], essentially because such edges are 
not contrast invariant. In this paper, V.Caselles, B.Coll and J.M. Morel argue 
that the atoms of the perception, that is, the basic elements on which further 
representations may be built, are not edges but “pieces of level lines joining 
junctions”. 

Let us first recall what level lines are. Let 17 be an open bounded subset 
of The total variation of an image u : 17 — >■ IR can be simply defined, if 
u G C^(17), as 

TV(u)= / \Vu{x)\dx. (1) 

Jo 

If the gradient of u, Vu, does not exist or is not continuous but if u G L^(l7), (I) 
is generalized into 

TV(m) = sup{ / u(a;)(div^)(a:) dx / (j) G (7^(17, and \(j)\ < 1}. (2) 

4> Jo 

We say that u is of bounded variation {u G BV(17)) if TV(m) < -l-oo. A set 
P C 17 has finite perimeter if per(P) = TV(Ip) < -l-oo. In that case d*P, the 
essential boundary of P, is an union of a countable set of Jordan curves [8]. 

Let L\ be the lower level set A and the upper level set ^ of u: 

L\ = {x G f2/u{x) < A},M^ = {x G Q/u{x) > /i}. (3) 

We shall call level set any lower or upper level set. The family (La) a or (M^)^ 
is a complete representation, since one can reconstruct the image by 

u{x) = inf{x G La} = sup{x G M^}. (4) 

If u is BV, then all level sets are of finite perimeter and their essential bound- 
aries constitute the level lines of u. If we map the level lines of an image for a 
given set of levels {Ai < A 2 . . . < A„ = -l-oo}, we get a segmentation of the image 
with sets of type {x G fl/Xi-i < u{x) < Ai}, also called topographic map [6] (see 
first line of Figure 3). More generally, one can consider a segmentation achieved 
using only some connected components of lower levet sets (La)a and upper level 
sets (M^)^. Notice that pieces of some (but not all) level lines are located at the 
perceptive edges and that conversely, all perceptive edges correspond to pieces 
of some level lines. A topographic map has also interesting invariance proper- 
ties : the map commutes with any affine transformation performed on the image 
(translation, rotation, and zoom) and it does not change when the contrast of the 
image is modified (the so-called morphological property). Thus, a topographic 
map achieves a morphological segmentation with suitable properties to build an 
image model based on perceptual edges. The question is now : how should we 
select the level sets so that the level lines match as well as possible the visual 
perception of edges ? 
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We shall emphasize that the physical generation process of an image implies 
some events (as occlusions and transparencies) which cause singularities on the 
topographic map : level lines joining some other level lines with a shape (more or 
less) like a T in case of an occlusion. The T-junction singularity is one of the most 
significant principles of the visual reconstruction, which allows a geometrical 
constitution of the visual objects. It is in the heart of the Gestaltist’s theory, and 
in particular of the Kanizsa’s work [15] [16]. Each time a T-junction is detected, 
our perception reconstructs the occlusion of an object by another one, and the 
border of the occluded object is mentally extended behind the horizontal bar of 
the T. In the left drawing of Figure 1, the observer reconstructs black disks from 
quarters of disks only. This phenomenological description, originally formulated 
by G.Kanizsa in the case of drawings, can be easily adapted to digital images 
using level lines [1]. The main difference lies in the fact that on drawings, T- 
j unctions occur where the line of the pen meets a previous line only, that is, at 
places where an object begins to come in front of another. On digital images 
of natural world, the objects are never uniformly shined, and therefore even 
unvaried colored surfaces present lot of level lines. At the borders of an object, 
these level lines meet the level lines of the background and generate multiple 
T-junctions : occlusions occur along all the borders. The shapes of the objects is 
then essentially characterised by the T-junctions on them, and by the pieces of 
the level lines joining these junctions. In this way, one may give a morphological 
definition of edges : we call morphological edge a piece of level line joining any 
number of T-junctions. The more a morphological edge contains T-junctions, 
the more it is perceptually significant : the number of T-junctions contained in 
a morphological edge behaves as an inverse scale parameter. 




(a) (b) 



Fig. 1. The visual power of T-junctions. The human visual system reconstructs the 
black objects of the drawing (a) as disks partially covered by boxes. This reconstruction 
is due to the T-junctions, which are abvious on the topographic map (b). One of the 
Kanizsa’s principle says that the border of the occluded object has to be extended so 
that to preserve its curvature. 

To detect the significant T-junctions on natural images, we use an algorithm 
adapted from [6] which ensures the existence of three connected components with 
non-negligible size, one belonging to the occulting object, one belonging to the 
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occulted object and one part of the background. The geometry of every recorded 
T-junction is characterized by only two of these three connected components. 
For our image model, we consider the one wich is a lower level set L\ and the 
one which is an upper level set (see Figure 2). In this way, morphological 
edges are composed by borders of connected components denoted and M^. 
Notice that in the discrete case Q C the border of a region lies in the shifted 
grid {7L + 1/2)^. If we note dP the internal border of a region P (which lies in 
the TZi^ grid), at every T-junction x € (S-f 1/2)^ can be associated the neighbor 
pixels x\ G dL\ and G dM^. 




Fig. 2. The T-junction detection algorithm ensures the existence of three significant 
connected components, one being part of a lower level set L\ and one being part of an 
upper level set (in this example, A = 10 and jj, — 30). The T-junction point x does 
not belong to the pixel’s grid, but the points Xx and do. 



Since each morphological edge belongs to a border of a set Lx or M^, the 
issue is to choice, from all possible connected components of the level sets family 
{L\)x and (M^)^, the ones that contain the greatest numbers of T-junctions : 

Multiscale segmentation algorithm - 

Parameter : s G [0, 1] is the scale of the segmentation. 

Step 1 : decompose the image u into its level sets (Ta)a otid (M^)^. 

Step 2 : compute the sequences of T-junctions = {x\)\^k, Tm = 
and the associated connected components L\ and Mjj. Let N be the number of 
T-junctions : N = \Ti^\ = \Tm\- 

Step 3 : sort the sequences L^ and Mjj in the order given by the number of 
T-junctions : Lx is before Ly if iT^ddLxl > iT^ddLyl and Mjj is before M^', 
if\TMndM^\ > \TMndM^:\. 

Step 4 : the multiscale segmentation V is made by the first N(l — s) connected 
components of the sequences L\ and Mjj. 

When this algorithm ends, the topographic map defined by the level sets in 
P is a morphological segmentation of u so that each dPi is made by pieces of 
morphological edges : P = {Pi}i = {L\}k,\ U The resolution of the 
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segmentation that is, the visual significance of the less perceptive edge, is given 
by the scale parameter s : when s tends to 0 almost all level sets are mapped (even 
those which are not perceptually significant), and when s tends to 1 only level 
sets with borders matching the most important perceptual edges are considered. 
Example of a segmentation at two different scales is shown Figure 3, at the first 
column of the second and third lines. 

Remark 

— The scale is not directly related to the size of the regions. Although level 
sets of big size are likely to contain more T-junctions than level sets of 
smaller size, the level lines corresponding to a border of a small object may 
be recorded at a coarse scale, while the level lines corresponding to the 
gradation of the light inside a large surface may not. 

— The algorithm works for natural images (i.e. photographs of the real world) 
only. For synthetic images, one would like to replace the T-junction crite- 
rion by something related to the contrast of the level lines. With natural 
images, we should not use the contrast, since the segmentation would not be 
morphological. 



3 Compact and multiscale image model 

The compact image model is based on the multiscale segmentation and on pixel’s 
values needed to reconstruct an approximation of the original image. 

We use a piecewise-smooth and non-oscillating approximation Vg of the image 
u in each region defined by the multiscale segmentation. In this way, we remove 
the upper part of the total variation of u. In [13], we show how the total variation 
can be related to perceptual microtextures : microtextures correspond to fast 
oscillating parts at every scales, and therefore are associated to high variations. 
On the other hand, edges surrounding flat regions generate little variations. The 
coarea formula [11] allows to link the total variation of u with the total length 
of its level lines : 

TV{u)= [ per{Lx) dX. (5) 

Jtr 

Therefore, our morphological approach leads us to split the information between 
edges and microtextures. More precisely, the compact image model tends to 
remove the microtextures when the scale s increases, whereas borders of the main 
structures are kept. For this reason, the approximation Vg is a sketch of u at the 
scale s of the segmentation. One may also want to conserve the microtexture and 
to code them apart. This can be done by computing the error image wi-g = u—Vg 
and by removing some residual sketch information in wi-g. This problem is 
developed in [13] and will not be addressed here. 

Our model contains two types of data : geometrical data record pixel locations 
on the grid, and numerical data are related to the gray level values at these 
locations. The numerical data are used to compute the sketch Vg on u defined 
on each region Pi of the segmentation. The multiscale segmentation algorithm 
ensures that no important edge can be located inside Pi : this explains why 
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the image may be well approximated by a piecewise-smooth function Vs- In 
addition, the approximating function is chosen non-oscillating so that it catches 
the sketch and not the textures. How should be chosen the data to allow a 
good approximation ? Since the border of Pi is made by morphological edges, 
the knowledge of u in the internal and external side of each edge is the basic 
information. In order to get a compact model, we propose to retain only two 
samples of u for each level line (one for the internal side and one for the external) . 
Since the external side corresponds to the internal side of the adjacent level line 
(remember that both lower and upper level sets are recorded), it is actually 
enough to retain one value per level line. 

A pixel X of dPi may be associated to several morphological edges (it may 
belong to different level sets). In that case, v.s{x) is set to be the closest value 
to u{x). This operation corresponds to chose the smallest level set, and it is 
implemented by means of an “inf” (lower level sets) or a “sup” (upper level 
sets) on all level lines containing x. In the internal side of a level line we 
have to chose how to represent a sequence of gray level values by only one. The 
values may be chosen to lower the variation of Vg between two neighbouring 
regions of a T-junction. This not only helps to get a non-oscillating function, 
but also prevents the appearance of visual artefacts near the edges by keeping a 
low contrast. In that case, the value is the “sup” of the gray levels (lower level 
sets) or the “inf” (upper level sets). Another possibility is to try to preserve the 
average gray level of the border by computing the median or the mean value. 

These remarks lead the following algorithm which computes the values 4>i{x) 
of Vg on each dPi : 

Algorithm to get samples of the approximating function - 

Step 1 : Compute Vg on internal boundary associated to lower level sets, as 
follows. \/Pi G pyx G dPi, if3k,X/x G dL’l, and if (f>i{x) has not been already 
defined, define 

(j>i{x) = inf sup u{y). (6) 

\M/xedLl y^dL\ 

Step 2 : Compute Vg on internal borders associated to upper level sets, as 
follows. \/Pi G P,Vx G dPi, if3k,p./x G dMj^, and if y (x) has not been already 
defined, define 

y{x) = sup inf u{y). (7) 

ti,k/x&dM^ yedMf 

To summarize our discussion, the compact image model contains the follow- 
ing data : 

— the segmentation map (dPfji; 

— the sequence of samples {(pi{x))i.^x&gp^. 

Example of such data is shown Figure 3, at the second column of the second and 
third lines. 

The issue of computing an approximating function Vg from the samples 
x£dPi belongs to the class of interpolation problems. Different approaches 
using image interpolation techniques have been described in the literature (see 
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for example [17], [4], [12]), some of them including the same motives than ours of 
catching the image sketch in a compact way. Recently, a morphological interpo- 
lation technique for image coding has been proposed by J. Casas in [5]. In our 
knowledge, our work is the first one to use a segmentation map not based on a 
classical (and non morphological) edge detector, but on a selection of the level 
sets that carry the atoms of the perception. 

We shall retain the work of V.Caselles, B.Coll and J.M. Morel in [7], where 
they extend the Casas’ morphological interpolation technique, and where they 
prove that any interpolation operator (satisfying fairly conditions, such as mor- 
phological, invariance and regularity properties) comes down to let evolve the 
interpolating function on each Pi with the following equation : 



dw 2 



/ Dw 
\\Dw\'‘ 
w{0, x) = Wq{x) 



Dw 






\Dw\J 

w{t,x) = Wq{x) = 4>iix) 



Vt > 0, Va; G P*; 

Vx G P,; 

Vt > 0, Va; G dPi- 



(8) 



We have written by Dw the gradient of w along the spatial coordinates, and by 
D'^w the Hessian of w, that is, the matrix of the second derivatives of w. 

Under some reasonable conditions [3], there exists a unique continuous vis- 
cosity solution w{t, x) of (8) such that w{t, .) is a Lipshitz function for all t > 0 on 
each Pi, with uniformly bounded Lipschitz norm. When t — >■ -|-oo, w(t , .) — >■ 
with ri^Qp. = (j)i. The function is an absolutely minimizing Lipschitz inter- 
polant of 4>i inside Pi, or AMLE for short. The evolution equation (8) can be 
solved using an implicit Euler scheme, so that to transform the evolution prob- 
lem to a sequence of non linear elliptic problems, which leads in a discrete case 
to an implicit difference scheme. The sketch Vs is defined by Vg = Vi on each 
region Pi of the multiscale segmentation V. 

Figures 3 and 4 give examples of a sketch image at different scales. 



4 Application to image compression 

In order to show how the multiscale image model is compact, the geometrical and 
numerical data have been error-free compressed using adapted coding techniques. 
We do not claim that our compression scheme gives better results than well 
established compression standards, further developments are needed before we 
could present a fair comparison. We just wish to mention the main advantage 
of a model based on level sets : the segmentation is not only invariant by affine 
map and by contrast change, but it permits to code very precisely the perceptual 
edges. On the contrary, compression schemes based on a space-frequency or on 
a linear scale-space representation cannot respect the perceptual edges when 
the compression ratio is to high : blocking artifact called Gibbs phenomena 
appears in the neighborhood of the edges, due to the quantization of the Fourier 
coefficients which have a bad decay at discontinuities. The same phenomena 
occurs with wavelet coefficients, althought their space-localization reduces the 
artifacts. 








Figure 5 illustrates this problem. We have compressed the “House” (a simple 
image with strong edges) at a high compression ratio of 38. Both JPEG [24] (the 
standard for Fourier-based compression) and the Shapiro’s EZW [23] (Embedded 
Zerotrees Wavelet, one of the best wavelet-based compression scheme) algorithms 
give very poor results. Our compact image model allows to code the “House” 
at the same compression ratio without distortion of the edges. Of course, only 
the most important structures are preserved. To keep some texture, we propose 



Fig. 3. This plate illustrates the multiscale image model. First line, from left to right: 
Original Lenna 256 x 256 image; a topographic map which gives the borders of the lower 
level sets L\ for A = 8n, n — 0, . . . , 32; reconstruction using the lower level sets of the 
previous image only. Second line, from left to right: segmentation at scale s = 0.65; 
samples 0i(a:) used to compute the sketch; sketch at scale s = 0.65. Third line, from 
left to right: segmentation at scale s = 0.90; samples 4>i{x) used to compute the sketch; 
sketch at scale s = 0.90. 
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Fig. 4. Multiscale analysis of a view from a satellite, (a): original 256 x 256 image, (b): 
sketch at scale s = 0.90. (c): sketch at scale s = 0.96. (d): sketch at scale s = 0.99. 
Only 1% of all lower level sets and 1% of all upper level sets are used to reconstruct 
this image. 



in [13] to compute the error image and to compress it using a linear scale-space 
representation : since the error image does not have to code edges, we do not 
introduce blocking artifact. 

5 Conclusion 

The level sets decomposition is a well-known tool to get an invariant and mor- 
phological image representation. But it is perceived as a redundant structure 
generating an huge quantity of data. In this paper, we propose a method to 
select the most representative level sets. From these level sets, a sketch image 
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Fig. 5. 

Comparison with linear scale-space and space- frequency compression schemes, (a): orig- 
inal “Honse” image. Size 256 x 256, 8 bpp (bit per pixel), (b): Compact image model 
of (a) at a scale that leads a bit rate of 0.21 bpp. This bit rate corresponds to a com- 
pression ratio of 38. (c): Image (a) compressed at the same ratio using a biorthonormal 
wavelet scheme (EZW). (d): Image (a) compressed at the same ratio using a windowed 
Fourier scheme (JPEG). 



can be reconstructed. It contains the borders of the most important structure, 
up to a scale parameter. This requires a new definition of edges, as level lines 
containing a great number of topological singularities (the T-junctions). This 
approach allows to build a non-linear scale-space image model respectful of the 
human visual system, and with a compactness that makes it suitable to perform 
image compression at low bit rate. 
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Abstract. It is well known that a conveniently rescaled iterated convo- 
lution of a linear positive kernel converges to a Gaussian. Therefore, all 
iterative linear smoothing methods of a signal or an image boils down 
to the application to the signal of the Heat Equation. In this survey, 
we explain how a similar analysis can be performed for image iterative 
smoothing by contrast invariant monotone operators. In particular, we 
prove that all iterated affine and contrast invariant monotone opera- 
tors are equivalent to the unique affine invariant curvature motion. We 
also prove that under very broad conditions, weighted median filters are 
equivalent to the Mean Curvature Motion Equation. 



Introduction 

The goal of this paper is to precise rigorously the link between the morphological 
Scale Space Theory and the Mathematical Morphology Theory. The equations 
we will consider are the affine morphological scale space of Alvarez, Guichard, 
Lions and Morel [1] and the Motion by Mean Curvature (see for example [12], 
[9]) . The reason why these equations are so relevant is that they are invariant 
with respect to some large classes of geometric transformations and contrast 
change. We will introduce approximation schemes which shall have a theoretical 
interest in the affine invariant case, and both theoretical and practical interest 
for the Mean Curvature Motion. The plan is the following: in the first section, 
we recall some of the main results of Mathematical Morphology and particu- 
larly the characterization of operator commuting with continuous nondecreasing 
functions (contrast change). In section 2, we prove that, under very smooth as- 
sumptions, any rescaled affine and contrast invariant operator is asymptotically 
equivalent to the only affine and contrast invariant differential operator. This 
consistency results will provide the convergence result is section 3, in which we 
prove that if the scale is adequately chosen, then the iterated operator converges 
to the affine and contrast invariant nonlinear semi-group. Section 4 will be de- 
voted to the proof of the convergence of an algorithm previously introduced by 
Bence, Merriman and Osher in [5]. The results we prove are a generalization of 
the results proved by Guichard and Morel in [14] and Catte in [7] for the affine 
case, and generalize or revisit the results in [5], [8], [10], [11], [15]. 
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1 Mathematical Morphology 

1.1 Basic Results 

It is well known that in image analysis, one of the most basic tasks is to smooth 
an image uq(x) for noise removal and shape simplification. Such a smoothing 
should preserve as much as possible the essential features of an image. This 
requirement is most easily formalized in terms of invariance. Two invariance 
requirements are basic in this context : given a smoothing operator T, it should 
commute with contrast changes, that is, increasing functions. Indeed, for physical 
and technological reasons, most digital images are known up to a contrast change. 
The second obvious requirement is geometric invariance : since the position of 
the camera is in general arbitrary or unknown, the operator T should commute 
with translations, rotations, and, when possible, with affine and even projective 
transforms of the image plane. The School of Mathematical Morphology [18] 
[21] was one of the first to rigorously study and characterize operators acting on 
sets. We shall see that this theory is intimately linked with contrast invariant 
operators acting on images. As an image in known up to a contrast change, 
it is better suited to consider the equivalence classes of functions that can be 
obtained from one another via a contrast change. An obvious consequence of 
this, is that an image is completely determined by the geometry of its level sets: 
if u : IR^ — >■ IR is a grey level image and A G IR, we call level set of u at the 
value A the subset of IR ^ Xa(m) = {x G IR '^, u{x) > A}. It is obvious that two 
elements of the same equivalent class will have the same level sets. We also note 
that the family xx{u) satisfies the following properties : 



A < Ai ^ Xm(«) C Xa(w), 


(1) 


Xa(w) = n X/x(m). 


(2) 



m<a 



Conversely, we can prove that if (Aa)ash is any family of subsets satisfying 
equations (1) and (2), then it determines an equivalence class of images. More 
precisely, if we set u{x) = sup{A s.t. x G X\}, we see that the level sets of u 
are the A^’s. It is true (though not trivial) that any function with the same 
level sets is obtained from u by applying a contrast change. For this reason, the 
first object of Mathematical Morphology was to study operators acting on sets 
respecting the usual ordering given by inclusion. Since no point is a priori a 
privileged reference, translation invariant operators are particularly interesting 
and we shall always make this assumption. 

Definition 1. Let T be an operator acting on a subsets o/IR^. We say that T 
is morphological if it is nondecreasing for the inclusion ordering and commutes 
with translations. 



Matheron proved the following 
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Theorem 1 (Matheronl). If T is a morphological operator acting on sets, 
then there is a family B of subsets ofTR^ called structuring elements, such that 

Tx=Uf|(x-y). 

The family B is not unique hut we can take B = {X C IR^, 0 G TX}. 

We saw that the knowledge of families of sets satisfying equations (1) and (2) is 
the same as the knowledge of a function up to a contrast change. We would like 
to transpose Definition 1 and Theorem 1 to functions. We are led to 

Definition 2. Let T he a set of functions on containing continuous func- 
tions and characteristics functions of level sets of elements of T . We say that 
T : T — > T is morphological if and only if T is monotone (that is u < v 
in implies Tu < Tv in commutes with translations and continuous 

nondecreasing functions (contrast changes). 

Matheron then proved the expected result 

Theorem 2 (Matheron2). Let T he an operator defined on a set of functions 
T as in Definition 2. Then T is morphological if and only if there exists a family 
B of subsets of called structuring elements, such that 

Tu(x) = sup inf u(x + y). (3) 

In the same way there exists an other family B' such that 

Tm(x) = inf supu(x + y). (4) 

BeB' ygJJ 



1.2 Scale Space and Mathematical Morphology 



In a completely different setting, the concept of Scale Space was introduced 
in [16] [17] [22]. To fix notations, it consists in a family of operators {Tt)t>o 
over real valued functions (grey level images) in IR^ {N > 2). Let Ut = TfUo: it 
corresponds to smoothed versions of the image depending upon a scale parameter 
t. A complete axiomatization was presented in [1] [2] and all scale spaces were 
classified with respect to their geometrical invariance properties. It is then proved 
that Ut is solution of a second order parabolic PDE. Among the relevant PDEs, 
we find the Mean Curvature Motion (MCM) 



du . (D'^uDu, Du) 

\Du¥ 



(5) 



Another important result is that there exists a single Affine Morphological 
Scale Space (AMSS), i.e. commuting with nondecreasing functions, invariant 
by translation, grey level shift and affine mapping of IR^ . Moreover, this scale 




Morphological Scale Space and Mathematical Morphology 167 



space is not projective invariant. Thus, in the frame of scale space theory, projec- 
tive invariance and contrast invariance are incompatible. The affine and contrast 
invariant PDE in IR^ is 



du 

di 



(x,t) 



|T>u|t«+i 




1 

N + 1 






, Xn-i), 



(6) 



where Xi is the principal curvature of the level surface of u{- ,t) at x and H 
is equal to 1 if and only if the Xi are all strictly positive, to -1 if they are strictly 
negative and 0 otherwise. The principal curvatures of u are the eigenvalues of the 
second derivative D^u restricted to the hyperplane orthogonal to Du, divided by 
\Du\. Of course, these curvatures are only defined when the gradient is different 
from 0. Note that from Matheron Theorem, if the considered scale space is 
morphological the operator Tf can be written in an “inf-sup'' form with a family 
of structuring elements depending on the scale t. When t tends to 0, the 
operator Tt — Id is approximately the infinitesimal generator of the nonlinear 
semigroup of the equation. The basic idea, is to retrieve the solution of the PDE 
by constructing an operator that is tangent to the infinitesimal generator. In [8], 
Catte, Dibos and Koepfler already established a link between both point of views 
(Matheron’s Mathematical Morphology and geometrical PDEs) by proving that 
if the family of structuring elements B is an isotropic family of segments centered 
at the origin with equal length, then adequately rescaled iterated Matheron filters 
converge to the viscosity solution of the Mean Curvature Equation. A more 
general result was presented in [20], where structuring elements were exhibited 
to approximate equations of the type 

^ = |Du|(curv u)'* 

in the plane, for all 7 > 0. 



2 Approximation scheme for the AfRne Scale Space 

In this section, we prove (under some basic assumptions that are not restrictive 
at all) that if we adequately scale any Matheron morphological affine invariant 
operators, then the iterated associated operators converge to the semi-group of 
the affine invariant, geometrical evolution PDE of the classification established 
in [1]. Let us precise that we say that an operator is affine invariant if it commutes 
with affine transformation with determinant equal to 1. In [13], it is proved that 
it must be covariant with respect to any affine transformation, but a scale factor 
simply depending on the determinant of the transformation must be introduced. 
We recall that our schemes in the affine invariant case cannot be used to numer- 
ically approximate the Affine Scale Space. This has been tested but the results 
were not much satisfying, since because for numerical reasons, affine invariance 
is only approximated. In the case of inf-sup schemes, this approximation is very 
poor if the family of structuring elements is the family of ellipses with area l.On 
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the contrary, a very fast morphological and affine invariant algorithm is pre- 
sented by Moisan in [19] to compute the solution of the AMSS Equation in the 
two-dimensional case. By Matheron Theorem, this scheme can also be reformu- 
lated in terms of inf- sup, but the family is less simple. The extension of Guichard 
and Morel’s result to any dimension is interesting because three-dimensional im- 
ages and even movies of three dimensional images are already available in the 
medical domain. These last ones can be considered as four-dimensional images, 
whereas two-dimensional movies can also be seen as three-dimensional images. 
From Matheron’s Theorem 1, we can also assume that T is affine invariant if 
and only if the family B is also affine invariant. Let B be a family of structuring 
elements. Let us introduce a scale parameter s and consider the family of struc- 
turing elements B^ obtained from B by a simple dilation: Bg = s^/^B {N being 
the space dimension). The real s is thus a scale parameter linked to the size of 
the structuring elements. Let us now introduce the operator 

/S'gu(x) = inf sup m(x 4- y) (7) 

SGIBs Q 

and the dual operator 



SIsu{'x) = sup inf u(x-l-y). (8) 

BeB„ yes 



Proposition 1. Let B an affine invariant closed (with respect to the Hausdorff 
distance) family of structuring elements which are closed, convex, symmetric 
with respect to 0, with measure 1 and let u : IR^ TR be a function. Then, 
there exists a positive constant cb only depending on B such that 



/S'gM(x) - m(x) 
lim 2 

S— >-0 ^ AT-f 1 



cbW(A+---A+_i)«+i 



(9) 



where p = |Du(x)| and Ai,...,Ajv-i are the principal curvatures of the level 
surface going through tq (in the formula above, the exponent is the positive 
part of a number, that is to say = max(A, 0 ) ). 

We have a similar result by replacing ISg by Sis (obtained by swapping inf and 
supj and the Xf by X~ . 



We shall also need consistency results on the alternate operator SlglSs- To this 
end, we prove that except at critical points, consistency is uniform. We then 
establish the link between consistency and convergence. Let h = s^/CAf-i-i) 

Th = SIsISs. 

Theorem 3. Let Ug G BUC(1R^) (bounded and uniformly continuous) . The 
approximate solutions Uh defined by 



Vn e IN, Vt G [(n — l)h, nh[, 



Uhi^A) = Tf;uo{x) 



(10) 
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converge towards the unique solution in BUC(Si^) of 



du 

m 



cb\Du\ 




1 

N + 1 






, Xn-i)- 



( 11 ) 



with initial data uq. Here • • • , Aat_i) = —1 if the Xi are all negative, 1 if 

they are all positive and 0 otherwise. Convergence is uniform on every compact 
set ofTR^ X IR+. 

We sever the proof of consistency in a serie of lemmas. We do not enter into 
details since the proofs are a bit long a technical. A complete version is given 
in [6]. To simplify the notations, we denote by ISs{px + ax^ + + 

cxyf), the value of the operator IS^ applied to the polynomial function 
and taken at the origin. Let s = 

Lemma 1. Set 

cb = ISi{x + yf + ■ ■ ■ + y‘ff_i). 

If the structuring elements are closed, convex, symmetric with respect to the 
origin and with measure 1, then cb > 0. 



Lemma 2. Assume that p > 0. If bi,. . . , In-i > 0 then 

ISs{px + biyl-\ \-bN-iylf_i) = cbPst^ {Xi ■ ■ ■ Xn-i)^ ( 12 ) 

where Xi = 2bi/p is the principal curvature. If one of the bi is nonpositive, 
then 



ISs{px + biyl H h bN-iVN-i) = o(s^^^+^) (13) 

Moreover, in both cases, the Inf-Sup is attained for structuring element included 
in a ball with radius which is o{r) . 

Lemma 3. There exists a function G{p, (bi), {cf)) bounded on every compact 
subset ofM~^ X IR^“^ x IR^“^ such that 

N-l N-l N-l 

ISs{px -I- -I- ^ b^y'^ + ^ Cixyt) = cbP^ 

i—1 2 = 1 2=1 

All these lemmas are based on estimates of the size in any directions on structur- 
ing elements attaining values near the infimum in the ISg operator. The proof of 
these lemmas, though not very complicated and using only basic mathematical 
tools, are heavy and can be find in [6]. Theorem 1 is a simple consequence of 
this bench of lemmas, since it suffices to use translation and rotation invariance 
to write Taylor expansion of any regular function in the form used in Lemma 3. 
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We did insist in Lemma 2 on the fact that the structuring elements attaining the 
right value of the inf-sup operator were included in a ball with radius r = 

This technical detail is in fact crucial since it allows to assert that ISg is basically 
a local operator (which it is a priori not, since the family of structuring element 
is not bounded). More precisely, we can bound ISg from below and from above 
by two operators that are really local. Define 



Igu(x) = inf sup u(x + y), 
BeBygBnS(0,r) 



(14) 



and 



Sgu{x) 



inf 

BeB 

BCB(0,r) 



sup u(x + y). 

y6B 



(15) 



Then, it is clear that IgU < ISgU < SgU (for Ig we take the supremum on smaller 
sets, and for Sg, we take the infimum on a subfamily of structuring elements). 
Now, the fact that, in Lemma 2 above, the structuring elements for quadratic 
forms are included in the ball S(0, r) implies that Ig, Sg and ISg are nearly equal 
since by only taking points in B(0,r), no information vanished. By construction, 
lgu{0) and Sgu{0) are local (they only depend on values of u in the ball B{0,r). 
Thus, to compute ISgu{0) for a function, we can only consider the values 
of u in B{0,r) by using Taylor expansion. The rest in this expansion is O(r^) 
and the definition of r is precisely chosen to make this rest negligible in front 
of the asymptotic term we give in Lemmas 2 and 3. In addition, we can prove 
that this argument also implies that the error term appearing in the consistency 
results is uniformly bounded in a ball with fixed radius near a point where the 
gradient and the curvature are not zero. A consequence is that consistency is 
uniform near those points. Uniform consistency in a sufficient condition to prove 
convergence. Another consequence is that we also get consistency results for the 
alternate operator SIgISg. 

Proposition 2. Let B a family of structuring elements invariant by SL(JR^) 
with elements that are closed, convex symmetric with respect to the origin and 
with Lehesgue measure equal to 1. There exists cb > 0 such that for any 
function u, such that Du{x) ^ 0 

lim ~ = cb|Dw(x)|(Ai • • • A^_i)^iL(Ai, . . . , A^_i) (16) 

s ->-0 

where the (Ai) are the principal curvatures of the level surface passing through x, 
H = 1 if all the curvatures are positive, H = —1 if they are negative and H = 0 
elsewhere. 



The gain here is that this operator is consistent with the true generator of the 
Affine Morphological Scale Space (contrary to the ISg operator, see Proposi- 
tion 1). The convergence of the scheme is then nearly guaranteed. Nevertheless, 
we saw that problems may occur at points where the gradient is equal to zero. 
To rigorously prove convergence, we first need the following lemma that allows 
to control the growth of T/j at critical points. 
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Lemma 4. Let a(x) = |xp. Then, 



Tha{x) - a{x) 

iim 

(x.^)^(o,o) h 



= 0 



(17) 



The main point here is that the limit is taken for x — >■ 0 and h ^ 0 independently. 
The Th operator ean either he ISu, SIh or SlhlSh- 

A direct consequence of this lemma is the following. 

Lemma 5. Let u be a hounded function such that Du{x) = 0. Let x/j tending 
to X as h tends to 0. Then 



Thu{xh) - u{xh) . 

lim ; = L). 

/i-s-o h 

In [4], Barles and Souganidis proved the convergence of any monotone, stable 
and consistent scheme. Proposition 2 and Lemma 5 are sufficient conditions to 
satisfy the hypotheses they gave. This directly ensures Theorem 3. 

3 Mean Curvature Motion and Median Filters 

This last section is devoted to a new proof that all properly rescaled and iterated 
weighted median filters, a class of isotropic morphological operators widely used 
in image processing, converge to Mean Curvature Motion. This result has already 
been proved by Ishii (see [15]), generalizing the proof by Barles and Georgelin 
([3]) and Evans ([11]) and answering a conjecture of Bence, Merriman and Osher 
([5], but the tools we use here are different and perhaps better adapted to the 
mathematical morphology theory. Precisely, let k a continuous radial probability 
density that decreases fast enough at infinity (this will be precised below) and 
define B = {B C IR^,meaSfei? > |}. Let also define the weighted median filter 
associated with the density k by 

medfeu(x) = inf supu(x + y). 

Scale k in kh by kh{x) = h~^ k{h~^x), let Th = med^,,. We prove the following 
Theorem 4. Define 

Uh{x, t) = ThUo{x) if nhf <t<(n+ l)h^ . 

Then (up to a linear rescaling of time), Uh converge uniformly on every compact 
set towards the solution of the Mean Curvature Motion defined by 

du ( „ (D‘^uDu,Du)\ 

a? |D„p )=° 

and with initial condition uq € BUC(1R^) . 
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The result has been proved by Guichard and Morel in [14] if k is compactly sup- 
ported. In this case, the median filter is really local and standard approximation 
arguments give the result. The other references above give a convergence result 
without this assumption. In this paper, we show that the formalism used by 
Guichard and Morel can also be adapted to this more general situation. Except 
in [15], the main part of the proof of the convergence relies on consistency argu- 
ments. We also adopt this technique as in the affine invariance case. As soon as k 
is not compactly supported, this is not clear that the median filter only depends 
on the local features of the image. In [3] [5] [11], k is a normalized Gaussian. 
As the decay is very fast in this case, we can expect that the associated median 
filter is local. Here we shall give weaker conditions of decay and prove the same 
result. The lack of space prevents us from giving the detailed proofs that the 
reader may find in [6]. 

As k is radial we can a define a function / by /(|x|) = fc(x). Assume that / is 
continuous, and that it is decreasing outside a bounded interval. The speed of 
decay is given by the condition 

J ^-y+N-l ^ 



where 7 is any number such that 7 > 3. Set also 

pOO 

1 / dt 

I ^0 

' pOO 

Jo 



c(fc) = -(iV-l)- 



We introduce another scale parameter r = where | < a < 1 is determined 
in the analysis. We now define two local operators and 



= inf sup m(x -f y) 

ygBnD(O.r) 



(18) 



and 



^>(x) 



inf 

B(ZD(0,r) 



sup u(x + y). 

y6B 



(19) 



They obviously satisfy < S^. It suffices to prove that and are 

consistent with the same differential operator to obtain the same results for T/j. 
It is even sufficient to prove some inequalities in this case. Precisely, we prove 

Lemma 6. Let p > 0. 



N-l N-1 

IJ^ {px + ax^ + E hyf + E CiXt/i) > c{k)Kh^ + o{h'^) (20) 

i=l i=l 



where k is the mean curvature at the origin. 
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Lemma 7. 



N-l N-1 

SJ^{px + ax^ + ^ biyl + ^ Cixyi) < c{k)Kh? + o{h^). (21) 

i=l i=l 

These lemmas provide consistency for functions since it suffices to use trans- 
lation invariance and rotation invariance (which is true since k is radial) and 
apply the results to the Taylor expansion of the function to analyze. We also 
get an error term of the form o(/i^) which is uniform around a point where the 
gradient is not equal to zero. Gathering both previous lemmas, we get 

Proposition 3. Let u be a function. Assume that Du{x.) yf 0. Then 

ThuM — m(x) ,, , , 

^ ^ =c{k)n + o{l), 

the term o(l) being uniform in a neighborhood ofx. 

Near critical points, we have to describe more precisely the behaviour of Th- 
Lemma 8. Let tp a function. Let xq G such that 

J DLp{yio) = 0 

\ £>2^(xo) = 0. 

and let x^ — >■ xq when ft. — >■ 0. Then 

lim = 0 . 

ft-s-0 ft^ 

To conclude, we apply the convergence results in [4] to prove Theorem 4. 

Acknowledgments. The author would like to thank Jean-Michel Morel and 
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Abstract. We decompose images into “shapes”, based on connected 
components of level sets, which can be put in a tree structure. This tree 
contains the purely geometric information present in the image, sepa- 
rated from the contrast information. This structure allows to suppress 
easily some shapes without affecting the others, which yields a peculiar 
kind of scale-space, where the information present at each scale is already 
present in the original image. 



1 Introduction 

Depending on the problem at hand, different representations of images must be 
used. For deblurring, restoration and denoising purposes, representations based 
on Fourier transform are well adapted because they rely on the generation pro- 
cess of the image (Shannon theory for the sampling step) and frequency models 
of degradations, for example concerning additive noise. Achieving a localization 
of the frequencies, wavelets decompositions [1,2] are known to be very efficient 
for compression of images. These representations are said to be additive in the 
sense that they decompose the image on a given a priori basis of elementary im- 
ages and it is represented as the weighted sum of the basis images, the weights 
being the coefficients of the decomposition. From the image analysis point of 
view, these representations are not necessarily as well adapted because wavelets 
are not translation invariant, the Fourier transform is not local and both yield 
quantized scales of observation. 

Scale-space and edge detection theories represent the images by “significant 
edges”, the image being smoothed (linearly or not [3,4]) and then convolved with 
an edge detector filter. This was first proposed by Marr [5] and then generalized 
by [6], whereas many developments where proposed for edge detection [7]. 

Extraction of “edges” was shown to be generally the output of a variational 
formulation [8,9]. The image is approximated by a function from a class for which 
the definition of edge becomes clear. The balance between the precision of the 
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approximation and its complexity (which can be measured for example by the 
length of the edges) yields a multi-scale representation of the image. Despite its 
generality, this approach suffers from the absence of an universal model. 

Scale-space representations based on edges are however incomplete (they do 
not allow to reconstruct the image) and the images at different scales are redun- 
dant [10,11,2]. 

Furthermore most of them do not take into account the fact that con- 
trast may strongly change without affecting much our perception of images, 
a problem underlined and considered as central by the mathematical morphol- 
ogy school[12,13j. It proposes a parameter free, complete and contrast insensitive 
representation of an image by its level sets. A recent variant [14] proposed to 
take as basic elements the boundaries of the level sets (called level lines), a 
representation named the “topographic map” . 

Our work [15] decomposes the image into connected components of level lines 
structured in a tree representing their geometrical inclusion. 

This tree allows to compute easily the effect of a multi-scale operator intro- 
duced in section 3 which is special because it proceeds by eliminating some level 
lines while keeping the others without smoothing them. The advantage is that 
important structures of the image are not damaged throughout the scale-space 
derived from this operator. 

The pyramidal decomposition of the image given by the tree can also be 
seen as a region growing decomposition (see [9] and references therein), where 
two regions corresponding to interiors of nested level lines are merged when the 
smaller enclosed region is too small. But here, no edge is moved and no spurious 
edge is created. The operator proceeds by removing level lines, and the contrast 
between two adjacent regions cannot increase, so no new gray level is introduced. 

The paper is organized as follow: Section 2 is devoted to the decomposition 
of images by connected components of their level sets into an inclusion tree-like 
structure. Section 3 describes the natural multi-scale operator to simplify this 
tree, which yields a “scale-space” representation of the image. At last, in section 
4 some experiments are shown. 



2 Level Sets and Connected Components 

2.1 Contrast Insensitive Representations 

Here an image u is defined as a function from a rectangle fl = [O, W] x [0, H] to 
R being constant on each “pixel” {j,j + l)x{i,i+l). The value attributed to each 
edgel {z} X (j, j + 1) and (z, z-l- 1) x {j} is the max of the values at the two adjacent 
pixels and at each point {z} x {j}, the max value of u at the 4 adjacent pixels. 
It is convenient to extend the image on the plane by setting u = uq outside 
12 where uq is an arbitrary fixed real value. This gives a continuously defined 
representation of a discrete array of pixels. Notice that with these conventions, 
u is upper semi-continuous. 
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Given an image u, upper (noted X\) and lower level sets are defined as 

= {x G u{x) > X} X^ = {a; G u{x) < fi}. (1) 

u can be rebuilt from the data of any of the families of upper and lower level 
sets [12,16,17]: 



u{x) = sup {X / X £ X\} = inf {^ / x £ X^}. (2) 

The interest of these representations is their insensitivity to contrast change, 
that is to say g{u) and u have the same families of level sets whenever g is a real 
strictly increasing function, representing a global contrast change. 

A fundamental property of the level sets is their monotonicity: 

VA < M, D X^, X^ C X>^. (3) 

As in [14], to alleviate the global aspect of these basic elements, only con- 
nected components (cc)^ will be used (which are invariant to a local contrast 
change): 

i—Nx i—N^ 

Xx= \J cc^(Xx) = U cc,(A^) 

i-l i=l 

Relation (3) translates to cc’s into: 

VA < A', i£ [l,iVA'], 3!j G [1, A^a] s.t. cCi{Xy) C cCj(Xx) (4) 

Indeed, cCi{X\>) C Xy C X\ and since it is connected, it is included in some ccj 
of Xx, with j unique. Equation (4) yields a tree structure for the cc’s of upper 
level sets (the same can be said of the cc’s of lower level sets). 

Actually, suppose the image takes its values in the discrete set {0, . . . ,U} 
(typically, U = 255) and consider the graph where the nodes represent all cc’s 
of all level sets Xq, . . . , Xfj. Let us write the node corresponding to cCi{Xx). 
Since a cc of upper level set may be extracted from several upper level sets 
(when it is sufficiently contrasted w.r.t. its neighborhood), suppose to avoid 
redundancy that only the one with greatest A is kept. Then put a link between 
and A^a-i whenever cCi{Xx) C cCj{Xx-\)- This graph is in fact a tree Tu, of 
root A^o corresponding to cci(Ao) = Equation (4) ensures that the graph is 
connected and without circuit, which are the two properties defining a tree. For 
each pixel P, 

n cc,{Xx) 

A,i s.t. P^cci{X\) 

is not empty (since P £ ccj) and is an intersection of non disjoint cc’s of upper 
level sets, so that by (4), it is itself a cc of upper level set; call its associated 

^ Notice that with the conventions above, connectedness corresponds to 8-connected- 
ness for upper level sets and 4-connectedness for lower level sets in the discretely 
defined image. 
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node Nu{P) and let Gray(A^„(P)) the corresponding gray level A from which it 
is extracted. Then we get 

u{P) = max{A : P £ X\} 

= max{A : 3i £ [1, s.t. P £ cCi{X\)} 

= Gray(iV„(P)) (5) 

In other words, reconstructing the image from the tree is made by attributing 
to each pixel the gray level value of the smallest cc of upper level set that contains 
it. Thus the data of u is equivalent to the data of Tu and of Nu{P) for all pixels 
P. Notice that nothing obliges us to store the values Gray(A^) when TV is a node 
of Tu. If we do not want to store them, but still have a reconstruction formula 
as (5), we attribute to each node N a gray level which is strictly decreasing when 
we follow up the (unique) path from N to the root in Tu (e.g. the depth of the 
node in Tu)- Then we can easily verify that the image reconstructed from it is 
u modulo a local contrast change. Reciprocally, if u is rt modulo a local contrast 
change, the trees of u and v are the same. 




Fig. 1. Up: originqal image and associated trees Tu (left) and 71 (right) (arrows are 
directed from child to parent). Middle: cc’s of upper level sets. Down: cc’s of lower 
level sets. 



All the above results can be stated with the appropriate changes for lower 
level sets to construct another tree Ti, and for each pixel P the node associated 
to the smallest containing cc of upper level set, Ni{P). An example of such a 
decomposition is given in fig. 1. This is what is done by [18]. 

2.2 The Inclusion Tree 

Each one of the trees Tu and Ti satisfy our requirements stated in the introduc- 
tion, nevertheless they make an a priori choice of the “objects” in the image: Tu 
is adapted to clear objects on a darker background. We would like to deal simul- 
taneously with clear objects and with dark objects. It is not satisfactory to keep 
both trees, because they are redundant, each one individually being sufficient to 
represent the image. Thus, we have to eliminate some cc from the trees. Two 
hypotheses will guide us: 
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1. The “interesting objects” in u stretch over a finite portion of the plane. 

2. The “interesting objects” have no holes. 

Our “interesting objects” will be cc’s of upper or lower level sets. The first 
hypothesis tells us to eliminate non bounded cc of level sets, the second one 
allows us to build a new tree from the remaining nodes. 




Fig. 2. C is a cc of a level set. Three level lines compose its border. The exterior level 
line is J 2 . Int J 2 = C U Int Jo U Int Ji is shown at the right. 



Each bounded cc of level set C has a topological border that is composed of 
one or several sets of connected edgels, called level lines. A level line separates 
the plane in two disjoint connected parts^, its interior (which is the bounded 
part) and its exterior (the unbounded one) [27]. C is comprised in the interior of 
one of the level lines composing its frontier, called its exterior border, and in the 
exterior of all others, called its interior borders (see fig. 2). The interiors of its 
interior borders constitute its holes. The hypothesis 2 leads us to consider not 
C but C union its holes, which we call a “shape” S{C). The border of S{C) is 
now only the exterior border of C . Therefore, we are led to consider the interiors 
of level lines. 



2 




A« 




G 

D 




Fig. 3. The inclusion tree T corresponding to a simple image. Notice that J> is a hole 
in F. Compare with the upper and lower trees 71, and 71 given in fig. 1 

^ The connectedness considered here does not correspond to the topological connect- 
edness in the continuous plane R^, but to the discrete notions of connectedness 
(4- and 8-connectedness). More precisely, if C comes from an upper level set, the 
part of the plane containing C is taken in 8-connectedness and the other part in 
4-connectedness, whereas it is the contrary if C is extracted from a lower level set. 
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A property similar to (4) can be proved for shapes, namely that if two shapes 
are not disjoint, one of them is included in the other. This relies on the fact that 
level lines do not cross: a level line cannot meet altogether the interior and 
the exterior of another level line. The proof of this is not trivial, and involves 
hypotheses on the function (semi-continuity is a sufficient condition), see [21]. 
Our definition of image ensures such sufficient conditions. 

The following operations are done for constructing the inclusion tree: Asso- 
ciate a node to each shape. Consider the entire plane (which is not a shape, 
because not bounded), as the root node. Put a link between two nodes whenever 
one of the shapes is included in the other and no third shape can be inserted 
between both. The resulting graph is a tree T, constructed from bounded cc’s 
of both upper and lower level sets [15]. The “interesting objects” are now rep- 
resented in one single tree (see fig. 3) . For a pixel P, we associate also the node 
N{P) in T associated to the smallest shape containing P. 

A reconstruction formula similar to (5) holds: u{P) = Gray (iV(P)). 



2.3 Summary 

We consider functions made of closed upper level sets whose topological bound- 
aries are a finite number of “level lines” . Such class of functions contains discrete 
images (pixel- wise defined), or functions having a minimal regularity. 

— We call shape the interior of a level line. 

— Level lines are closed curves that do not cross. 

— Thus two shapes are either disjoint, so that they are contained in a third 
shape, or nested, in which case one is a descendent of the other. This yields 
a tree structure for the set of shapes, where the relation child-parent means 
the topological inclusion. 

3 Scale-Space Representation 

3.1 A Multi-Scale Operator 

For a set B, we denote by \B\ its area, i.e. Lebesgue measure, or any other 
measure which is increasing with respect to the inclusion of the sets (if i? C C 
then \B\ < jCj). For a connected set B, we call its filled interior (j){B) the union 
of B and its holes and its filled area the area of its filled interior. In other words, 
4>{B) is the smallest simply connected set containing B. Let Bt be the family of 
closed connected sets B whose filled interior contains the origin O, \4>{B)\ > t 
and such that if O is in a hole H of P, then \(j){H)\ < t. 

Let us introduce our multi-scale operator: 

Ttu{x) = sup inf u(y). 

BeBt 



( 6 ) 
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The operator Tt applied to the image u is equivalent to removing all the 
shapes of area strictly less than t from the inclusion tree of u and constructing 
back the image. This yields another formulation of the operator T:^ 

^t'w(x) = inf sup u(y). (7) 

B£B± y^x+B 

This operator is at the same time a morphological opening and closing [13]! 
It is close to a filter proposed independently by several authors [22,18,23,24] but 
in their case the applied filter was equivalent to remove only nodes from the tree 
of cc’s of superior level sets or from the tree of inferior level sets, so that they 
get two different operators which do not commute (the opening and the closing 
version). The operator presented here is the grain filter studied in [19]. 

3.2 Properties 

Let us consider the properties of this multi-scale operator. Some of the properties 
suppose that the image is at least continuous, which is impossible with the 
continuously defined versions of discretely defined images we considered (except 
for trivial cases). Nevertheless, whereas the notion of the inclusion tree is not 
clear in such a case, the operator Tt can be defined as in equation (6). 

[Causality] The scale-space is causal, meaning that each scale can be de- 
duced from any anterior scale by a transition operator. 

Vs,t, s <t, so that Tt = Tt^s ° Tg (8) 

The transition operator is the operator itself: Tt^s = Tt- 
[Monotonicity] This scale-space is monotonous: 

u < V ^ yt, Ttu < Ttv (9) 

[Contrast covariance] If g is a contrast change (an increasing real valued 
function), then 



Vt g o Tt = Tt o g (10) 

[Negative covariance] Some other interesting feature of the operator is 
its negative covariance, that is that it commutes with taking a negative of the 
image:"* 



Vt Tt{-u) = -TtU (11) 

Notice that this is not the case with the operator defined in [18], where regional 
maxima and regional minima do not play symmetrical roles. 

® T = T' if one switches the connectedness for upper and lower level sets. We conjec- 
ture that they are also equal when acting on continuous functions. 

However, —u is lower semi-continuous, so that appropriate changes of connectedness 
must be applied: lower (resp. upper) level sets of —u must be considered in 8- (resp. 
4-) connectedness. 
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[Local extrema conservation] A local regional extremum in the image u 
remains either a local regional extremum at scale t, or is included in a bigger 
local regional extremum or disappears. In other words, regional extrema can 
grow, but they are never split during the scale-space and the operator proceeds 
by growing local regional extrema. Moreover, at scale t each regional extremum of 
Ttu contains a regional extremum of u. This implies that the number of regional 
extrema is a decreasing function of the scale. Notice that this property is not 
true with the linear scale-space (convolution by a gaussian) in two dimensions. 

[Idempotent] The operator has the property to be idempotent. 



Vt, Tt o Tt = Tt (12) 

[No asymptotic evolution] If u is C^, there is no asymptotic evolution of 
the image. We have the two behaviors: 

Vx, Vm(x) 0 > 0 so that V/i < t, {ThU — m)(x) = 0 (13) 

Vx, if 3r > 0 so that Vy G S(x, r), Vu(y) = 0 then 
3t > 0 so that Vft- < t, {ThU — u)(x) = 0 

[Conservation of T-junctions] Since the level lines at TtU are level lines 
of u, the T-junctions involving sufficient areas in u remain the same without 
alteration. Notice that this is not the case with all other usual scale-spaces: it 
is clearly false for the linear scale-space, for the median filter (mean curvature 
motion), but also for the affine invariant morphological scale-space, as shown 
in [25]. 

[Conservation of some regularity] If u is Lipschitz, so is TfU with an 
inferior or equal Lipschitz norm. Indeed, 

Vx, z, u(z) — fc|x| < u{z — x) < u(z) + fc|x| 



so that 



Vx,y, inf u(z) — k\x\ < inf u(z — x)< inf u(z) -|- fclxl 

zey+B zey+B zSy+B 

Vx,y, sup inf u(z) — fc|x| < sup inf m(z — x| < sup inf u(z)-|-fc|x|, 

BeBt^^y+B BeBt^^y+B BeBt^^y+B 

meaning 

Ttu{y) — k\x\ < inf ^(z — x) = inf u{z) = Ttu{x + y) < Ttu{y) + k\x\. 

zey+B zSx+y+B 

A similar demonstration shows that an uniformly continuous u remains uniformly 
continuous throughout the scale-space. We conjecture that if u is continuous, so 
is TtU for all t (a demonstration for the case of area opening and closing is shown 
in [26], we think our operator would behave similarly). Nevertheless, we cannot 
say more about regularization: it is not true that if u is so would be TtU. This 
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scale-space is peculiar because it does not allow to estimate more reliably the 
results of differential operators! 

[AfRne covariance] The operator commutes with all affine transforms of 
determinant 1: 



Vt.VA G AG{R^),Tt{uoA) = (T,/, ^et a|w) o A (15) 

Notice that equations (8), (9), (10) and (15) are properties that our scale- 
space shares only with the affine morphological scale-space. Nevertheless, the 
latter has an infinitesimal evolution law, whereas the former has not. 

Remark: the geometrical covariance of the operator is linked to the geomet- 
rical invariance of the measure, here the area under any affine transformation 
of determinant 1. With a different measure invariant under another group of 
transformations preserving the connectedness (so probably continuous transfor- 
mations would be welcome) and non decreasing with respect to inclusion, our 
operator would commute with these transformations. 



4 Experiments 

Fig. 4 illustrates the fact that this scale-space is different from the one deduced 
by iterating area opening and closing (see [18]) with increasing area. 

Different scales of the scale-space based on the inclusion tree are shown in 
fig. 5. Another example showing also the level lines is shown in fig. 6. Notice how 
the important structures of the image (in particular T-junctions) are preserved. 

The inclusion tree can also be used to remove impulsional noise: supposing 
that speckle noise creates only small shapes, we represent the image at a suffi- 
ciently large scale (see fig. 7). This suppresses most of noise, without attempting 
to restore the image, so a subsequent treatment should follow [19]. 

Other uses of the inclusion tree are proposed in [15]. 





o 



□ 




Fig. 4. Left: an image. Up: Successive removals of the cc’s of upper, then lower level 
sets with increasing area threshold. The black ring disappears before the white circle. 
Down: The image across the scales of the inclusion tree: the circles disappear according 
to their interior size. 
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Fig. 5. Up-left: original image 650 x 429, Up-right: image at scale 50 (all shapes of 
area less than 50 pixels are removed), Down: image at scale 500, and 5000. 



5 Summary and Conclusions 

The inclusion tree is a complete and non-redundant representation of image, 
insensitive to local changes of contrast. The basic elements are the interiors of 
connected components of level lines, called “shapes” . The structure of tree rep- 
resents the geometrical inclusion, allowing to easily manipulate it, like remov- 
ing some shapes, which is the fundamental operation. This yields a scale-space 
representation of the image which, on the contrary to most other scale-space 
representations, does not smooth the image, but rather selects the information 
to keep at each scale. As a consequence, its application field will be different 
from the classical scale-space. 

These shapes, appearing as natural geometrical contrast insensitive informa- 
tion, can also be used for various image analysis tasks, like image simplification, 
image comparison and registration [15,20]. 
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Abstract. A morphological scale-space representation is presented based 
on a morphological strong filter, the levelings. The scale-properties are 
analysed and illustrated. From one scale to the next, details vanish, but 
the contours of the remaining objects are preserved sharp and perfectly 
localised. This paper is followed by a companion paper on pde formula- 
tions of levelings. 



1 Introduction 

In many circumstances, the objects of interest which have to be measured, seg- 
mented or recognised in an image belong to a scale, and all remaining objects, 
to be discarded, to another scale. In some cases, however, such a threshold in 
the scales is not possible, and the information of interest is present at several 
scales: it has to be extracted from various scales. For such situations, multi- 
scale approaches have been developed, where a series of coarser and coarser 
representations of the same image are derived. The recognition of the objects or 
segmentation will use the complete set of representations at various scales and 
not only the initial image. 

A multiscale representation will be completely specified, if one has defined 
the transformations from a finer scale to a coarser scale. In order to reduce the 
freedom of choice, some properties of these transformations may be specified. 
Invariance properties are the most general: 

— spatial invariance = invariance by translation 

— isotropy = invariance by rotation 

— invariance under a change of illumination: the transformation should com- 
mute with an increasing anamorphosis of the luminance 

One may add some requirements on the effect of the transformation itself: 

— The transformation should really be a simplification of the image. As such 
it will not be reversible: some information has to be lost from one scale to 
the next. 
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— A particular form of simplification is expressed by the maximum principle: 
at any scale change, the maximal luminance at the coarser scale is always 
lower than the maximum intensity at the finer scale, the minimum always 
larger. [1] 

— Causality: coarser scales can only be caused by what happened at finer scales 

[ 2 ] 

— It should not create new structures at coarser scales ; the most frequent 
requirement is that it should not create new regional extrema. [3] [4] 

Furthermore, if the goal is image segmentation, one may require that the 
contours remain sharp and not displaced. Finally, one has to care for the relations 
between the various scales. Many scale-space representations in the literature 
verify a semi-group property: if g\ is the representation at scale A of image g, then 
the representation at scale /i of g\ should be the same as the representation at 
scale X + g of g : g\+fj, = (sa)/^ • We will present another structure by introducing 
an order relation among scales. 

Since one rarely adds images, there is no particular reason, except mathe- 
matical tractability, to ask for linear transforms. If one however choses linearity, 
then various groups of the constraints listed above lead to the same solution: lin- 
ear scale space theory. The evolution of images with the scale follows the physics 
of luminance diffusion: the decrease of luminance with scale is equal to the di- 
vergence of the luminance gradient [2] . The discrete operator for changing scale 
is a convolution by a Gaussian kernel. Its major utility is to regularize the im- 
ages, permitting to compute derivatives: the spatial derivatives of the Gaussian 
are solutions of the diffusion equation too, and together with the zeroth order 
Gaussian, they form a complete family of differential operators. Besides this ad- 
vantage, linear scale space cumulates the disadvantages. After convolution with 
a Gaussian kernerl, the images are uniformly blurred, also the regions of par- 
ticular interest like the edges. Furthermore, the localisation of the structures of 
interest becomes extremely imprecise ; if an object is found at one scale, one has 
to refine its contours along all finer scales. At very large scales, the objects are 
not recognisable at all, for excess of blurring, but also due to the apparition of 
spurious extrema in 2 dimensins. Various solutions have been proposed to reduce 
this problem. Perona and Malik were the first to propose a diffusion inhibited 
by high gradient values [5]. Weickert introduced a tensor dependent diffusion [6]. 
Such approaches reduce the problems but do not eliminate them completely: 
spurious extrema may still appear. 

Other non linear scale-spaces consider the evolution of curves and surfaces as 
a function of their geometry. Among them we find the morphological approaches 
producing dilations of increasing size for representing the successive scales [7]. 
These approaches have also the disadvantage to displace the boundaries. The first 
morphological scale-space approaches have been the granulometries associated 
to a family of openings or of closings ; openings operate only on the peaks 
and the closings only on the valleys [8], [9]. They obey a semi-group relation: 
ffmax(A,/i) = Using morphological openings also displaces the contours, 

however openings and closings do not create spurious extrema. If one desires to 
preserve the contours, one uses openings and closings by reconstruction. If one 
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desires a symmetric treatment of peaks and valleys, one uses alternate sequential 
filters, which are extremely costly in terms of computation, specially if one uses 
openings and closings by reconstruction [10] [11]. 

In this paper we present a new and extremely general non linear scale-space 
representation with many extremely interesting features. The most interesting of 
them is the preservation of contours. Furthermore, no spurious extrema appear. 
As a matter of fact, the transformation from one scale to the next, called leveling, 
respects all the criteria listed abovve, except that it is not linear. From one scale 
to the next, the structures of the image progressively vanish, becoming flat or 
’’quasi- flat” zone ; however, as long they are visible, they keep exactly the same 
localisation as in the initial image. Levelings have been introduced by F. Meyer. 
They have been studied by G.Matheron [12], F. Meyer [13], [14], and J.Serra [15]. 

In the first section, we present a characterisation and the scale-space prop- 
erties of the simplest levelings. In a second section we show how to transform 
any function g into a leveling of a function /. We also present extensions of lev- 
elings. The analysis of the algorithm for constructing levelings leads to a PDF 
formulation, presented in a second paper. In a last section we illustrate the result. 

2 Multiscale representation of images through levelings 

2.1 Flat and quasi-flat zones. 

We are working here on grey-tone functions defined on a digital grid. We call 
(p) the set of neighbors of a pixel p. The maximal (resp. minimal) value of 
a function g within Nq (p) represents the elementary dilation Sg (resp; erosion 
sg) of the function / at pixel p. 

A path P of cardinal n between two pixels p and q on the grid G is an n-tuple 
of pixels {pi,p 2 , ■■■,Pn) such that p\ = p and p„ = q, and for all i, {pi,pi+\) are 
neighbors. 

We will see that simple levelings are a subclass of connected operators [16], 
that means they extend flat zones and do not create new contours. More general 
levelings will extend quasi- flat zones, defined as follows. 

Deflnition 1. Two pixels x,y belong to the same R-flat-zone of a function f 
if and only if there exists a n-tuple of pixels (pi,p 2 , ■■■,Pn) such that p\ = x 
and Pn = y, and for all i, (pi,pi+i) are neighbours and verify the symmetrical 
relation: fp^ R /p^+i. 

The simplest symmetrical relation R is equality: fp. = fp^+i for which the 
quasi- flat zones are flat . As an example of a more complex relation R, let us define 
for two neighbouring pixels p and q, fp ^ fq hy \fp — fq\ < A. This relation is 
symmetrical and defines quasi-flat-zones with a maximal slope equal to A. 

2.2 Characterisation of levelings 

We will define a non linear scale-space representation of images based on level- 
ings. An image g will be a representation of an image / at a coarser scale, if g 
is a leveling of /, characterised by the following definition. 
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Definition 2. An image g is a a leveling of the image f iff V {p, q) neighbors: 
9p> 9q ^ fp> 9p and gq > fq 

Remark 1. If the function g is constant, no couple of neighboring pixels (p, (?) 
may be found for which gp > gq. Hence the implication {gp > gq ^ fp > gp} is 
always true, showing that a flat function is a leveling of any other function. 

The relation {<?is a leveling of /}will be written g < f ■ The characterisation 
using neighboring points, defining the levelings is illustrated by fig. lb. In [14] 
we have shown that adopting a different order relation, giving a new meaning 
to 9 p > 9 q leads to larger classes of levelings. 

2.3 Properties of levelings 

Algebraic properties If two functions gi and 52 both are levelings of the same 
function / then gi V 52 and <?i A (?2 are both levelings of /. This property permits 
to associate new levelings to family of levelings. In particular if (g^) is a family 
of levelings of /, the morphological centre (/ V A Pi) V of this family also is 
a leveling of /. 

Invariance properties In the introduction, we have listed a number of de- 
sirable properties of transformations on which to build a scale-space. They are 
obviously satisfied by levelings: 

— Invariance by spatial translation 
~ isotropy: invariance by rotation 

~ invariance to a change of illumination: g being a leveling of /, if g and / are 
submitted to a same increasing anamorphosis, then the transformed function 
g' will still be a leveling of the transformed function /'. 

Relation between 2 scales Levelings really will construct a scale-space, when 
a true simplification of the image occurs between two scales. Let us now charac- 
terize the type of simplifications implied by levelings. 

In this section we always suppose that (? is a leveling of /. As shown by the 
definition, if there is a transition for the function g between two neighboring 
pixels gp > gq, then there exists an even greater transition between fp and 
fq, as fp > 9p > 9q > fq. In other words to any contour of the function g 
corresponds a stronger contour of the function / at the very same location, 
and the localisation of this contour is exactly the same. This bracketing of each 
transition of the function g hy & transition of the function / also shows that 
the ’’causality principle” is verified: coarser scales can only be caused by what 
happened at finer scale. 

Furthermore, if we exclude the case where (? is a completely flat function, 
then the ’’maximum principle” also is satisfied: at any scale change, the maximal 
luminance at the coarser scale is always lower than the maximum intensity at 
the finer scale, the minimum is always larger. 
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Let us now analyse what happens on the zones where the leveling g departs 
from the function /. Let us consider two neighboring points {p, q) for which 
fp > 9p and fq > Pq. For such a couple of pixels, the second half of the definition: 
fp ^ 9p and pq > fq is wrong, showing that the first half must also be wrong: 
9p < 9q- By reason of symmetry we also have gp > pq, and hence Pp = pq. 
This means that if g is a leveling of /, the connected components of the anti- 
extensivity zones {f > g} are necessarily flat. By duality, the same holds for the 
extensivity zones {/ < g}. 

The last criterion ” no new extrema at larger scales” also is satisfied as shown 
by the following section. 



Life and death of the regional minima and maxima Levelings are a 
particular case of monotone planings: 

Definition 3. An image p is a a monotone planing of the image f iff {p, q) 
neighbors: 

9p ^ 9q fp ^ fq 



Theorem 1. A monotone planing does not ereate regional minima or maxima. 
In other words, if g is a monotone planing of f, and if g has a regional minimum 
(resp. maximum) X, then f possesses a regional minimum (resp. maximum) 
Z dX. 

Hint of the proof: If X is a regional minimum of g all its neighbors have a 
higher altitude. To these increasing transitions correspond increasing transitions 
of /. It is then easy to show that the lowest pixel for / within X belongs to a 
regional minimum Z for / included in X. 



Relations between multiple scales: preorder relation We have now to 
consider the relations between multiple scales. Until now, we have presented 
how levelings simplify images. For speaking about scales, we need some structure 
among scales. This structure is a lattice structure. To be a leveling is in fact an 
order relation as shown by the following two lemmas. 

Lemma: The relation {gis a leveling of /}is symmetric and transitive: it is a 
preorder relations. 

Lemma:The family of levelings, from which we exclude the trivial constant 
functions, verify the anti-symmetry relation: if / is a non constant function and 
a leveling of g, and simultaneously g is a leveling of /, then f = g. 

Being an anti-symmetric preorder relation, the relation {g is a leveling of /}is 
an order relation, except for functions which are constant everywhere. With the 
help of this order relation, we are now able to construct a multiscale representa- 
tion of an image in the form of a series of levelings (t/o = f,9i, ■■■■gn) where pk 
is a leveling of pk-i and as a consequence of the transitivity, pk also is a leveling 
of each function gi for I < k. 
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3 Construction of the levelings 

3.1 A criterion characterizing levelings 

It will be fruitful to consider the levelings as the intersection of two larger classes: 
the lower levelings and the upper levelings, defined as follows. 

Definition 4. A function g is a lower-leveling of a function f if and only if for 
any couple of neighbouring pixels {p, q) : gp > gq ^ gq > fq 



Definition 5. A function g is an upper-leveling of a function f if and only if 
for any couple of neighbouring pixels (p, q) : gp > gq ^ gp < fp 



The name “upper-leveling” comes from the fact that all connected com- 
ponents where g > f are flat: for any couple of neighbouring pixels (p,q)'. 
9q > fq ^ _ 

9p>fp 

Similarly if 5 is a lower leveling of /, then all connected components where g < f 
are flat. 

Obviously, a function p is a leveling of a function / if and only if it is both an 
upper and a lower leveling of the function /. Let us now propose an equivalent 
formulation for the lower levelings: 

Criterion: A function g is a lower-leveling of a function / if and only if for each 
pixel q with a neighbour p verifying gp > gq the relation gq > fq is satisfied. 

But the pixels with this property are those for which the dilation 5 will 
increase the value: gq < 6qg. This leads to a new criterion 

Criterion:A function 5 is a lower-leveling of a function / if and only if: gq < 
^q9 ^ 9q ^ fq 

Recalling that the logical meaning of [A B] is [not A or B] we may in- 
terpret [gq < Sqg ^ gq > fq] as [gq > Sqg or gq > fq] or in a equivalent manner 
[9q ^ fq ^ 5qg] . This gives the following criterion 

Criterion:A function p is a lower-leveling of a function / if and only if: g > fASg 
In a similar way we derive a criterion for upper levelings: 

Criterion Up:A function g is an upper-leveling of a function / if and only 
if: 5 </ V eg 

Putting everything together yields a criterion for levelings 
Criterion A function 5 is a leveling of a function / if and only if: f A Sg < g < 
fy eg (see [12]). 



3.2 Openings and closings by reconstruction 

We recall that a function g is an opening (resp. closing) by reconstruction of a 
function f iS g = f Adg (resp. g = f y eg). As it verifies the criterion Low (resp. 
Up), such a function g is then a lower (resp. upper) leveling of /. The reciprocal 
is also true. Hence: 
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Proposition 1. g is an opening (resp. closing) by reconstruction of a function 
f if and only if g is a lower (resp. upper) leveling of f verifying g < f (resp. 
9>f)- 

Using this characterisation, we may particularize the initial definition of lower 
levelings in the case where f > g : 

Proposition 2. g is an opening by reconstruction of a function f if and only if 
g < f and for any couple of neighbouring pixels {p, q) : gp > 9q ^ gq = fq- 

Proposition 3. g is a closing by reconstruction of a function f if and only if 
g > f and for any couple of neighbouring pixels {p, q): gp > 9q ^ gp = fp. 

Remark 2. If g is a (lower) leveling of / then g/\f is a lower leveling of / verifying 
g /\ f < f , i.e. an opening by reconstruction. Similarly if g is an upper leveling 
of / then 5 V / is a closing by reconstruction. 



3.3 An algorithm for constructing levelings 

We finally adopt the following general criterion of levelings 
Criterion: A function g is a leveling of a function / if and only if: f A ag < 
9 f 'd fig, where a is an extensive operator ag > g and fi an anti-extensive 
operator fig < g 

With the help of this criterion, we may turn each function g into the leveling 
of a function /. We will call the function / reference function and the function 
g marker function. Given two functions g and /, we want to transform g into a 
leveling of /. If g is not a leveling of /, then the criterion [f A ag < g < f \/ fig] 

is false for at least a pixel p. The criterion is not verified in two cases: 

— 9 p < fp A apg . Hence the smallest modification of gp for which the criterion 
becomes true is g'p = fp A apg. We remark that gp < g'p < fp 

— 9p > fp\/ fipg . Hence the smallest modification of gp for which the criterion 
becomes true is g'p = fp^ fipg. We remark that gp > g'p > fp 

We remark that for {gp = fp} the criterion is always satisfied. Hence another 
formulation of the algorithm: 

— lev~: On {g < f} do g = f A ag. 

— lev+: On {g > f } do g = f \/ fig 

It is easy to check that this algorithm amounts to replace everywhere g by 
the new value g = {f A ag) \/ fig = {f \/ fig) A ag 

We repeat the algorithm until the criterion is satisfied everywhere. We are 
sure that the algorithm will converge, since the modifications of g are pointwise 
monotonous: the successive values of g get closer and closer to / until conver- 
gence. 
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In order to optimize the speed of the algorithm, we use a unique parallel 
step of the algorithm g = {f A ag) V Pg After this first step both algorithms 
[lev~] and have no effect on each other and may be used in any order. 

In particular one may use them as sequential algorithms in which the new value 
of any pixel is used for computing the values of their neighboring pixels. This 
may be done during alternating raster scans, a direct scan from top to bottom 
and left to right being followed by an inverse scan from bottom to top and right 
to left. Or hierarchical queues may be used, allowing to process the pixels in 
decreasing order on {g < f} and on increasing order on {g > f}- 

Let us illustrate in fig. la how a a marker function h is transformed until 
it becomes a function g which is a leveling of /. This leveling uses for a the 
dilation <5 and for /3 the erosion e. On {h < /}, the leveling increases h as little 
as possible until a flat zone is created or the function g hits the function /: hence 
on {g < /}, the function g is flat. On {h > /}, the leveling decreases h as little 
as possible until a flat zone is created or the function g hits the function /: hence 
on {g > /}, the function g also is flat. For more general levelings, quasi-flat zones 
are created. 




Fig. 1. a) / = reference function ; h — marker function ; g = associated leveling ; b) 
characterisation of levelings on the transition zones. 



If g is not modified, while applying this complete algorithm to a couple of 
functions (/, g), then g is a leveling of /. If on the other hand g is modified, one 
repeats the same algorithm until convergence as explained above. 

3.4 Robustness of levelings 

In this section, we will see that levelings are particularly robust: they are strong 
morphological filters. We recall that an operator p is called morphological filter 
if it is: 

~ increasing: g > h cj>g > ph. This implies that cj){h A k) < (f>h A (f>k and 
(f>{h V fc) > (j)h V (j)k 

— idempotent: (fxj} = p. This means that the operator is stable: it is sufficient to 
apply it once in order to get the result (for instance, the median filter, which 
is not a morphological filter is not stable, it may oscillate when iterated) 
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It is strong, if furthermore 4>{Idy (f) = (f>{Id A (f>) = (j), where Id represents 
the identity operator. This property defines that functions within a given range 
will yield the same result, for any function h verifying / A 4>f < h < f \/ (j)f, we 
have (j)f = 4 >h. 

In our case, we define an operator Vg (/) which constructs the leveling of 
the marker g with reference function /. For a fixed function g and varying /, 
this operator is a strong morphological filter. If we call i>~ (/) the opening by 
reconstruction and (/)) the closing by reconstruction of / based on the marker 
g it can be shown that : Vg (/) = (/)) = (v~ (/)), an opening followed 

by a closing and simultaneously a closing followed by an opening, a sufficient 
condition for a leveling to be a strong morphological filter. We use this property 
for showing that yet another scale space dimension exists, based on levelings. 
We use here a family of leveling operators, based on a family (oi) of extensive 
dilations and the family of adjunct erosions (/?i), verifying for i > j : ai < aj 
and (3i > f3j. We call Ai the leveling built with at and (Hi. Then using the same 
marker g and the same reference function /, we obtain a family of increasing 
levelings: for i > j the leveling Ai{f; g) is a leveling of Aj(f; g). 

4 Illustration 

Levelings depend upon several parameters. First of all the type of leveling has to 
be chosen, this depends upon the choice of the operators a and /3. Fig. 2 presents 
three different levelings, applied to the same reference and marker image. The 
operators a and fd used for producing them are, from the left to the right, the 
following:!) a = 5 \ (3 = e \ 2) a = IdW (<5 — I) ; /3 = IdA{e -I- I) ; 3) a = IdW^S 
; (3 = IdAipe, where 7 and ip are respectively an opening and a closing. In Fig. 3 a 
flat leveling based on 6 and e is applied to the same reference image (in the centre 
of the figure), using different markers produced by an alternate sequential filter 
applied to the reference image : ’’marker 1” using disks as structuring elements, 
and ’’marker 2” using line segments. 

The last series of illustrations presents how levelings may be used in order 
to derive a multiscale representation of an image. We use as markers alternate 
sequential filters with disks: mg = original image ; rrii = ipi^irrii-i. The levelings 
are produced in the following manner: Iq is the original image and li is the leveling 
obtained if one takes as reference the image li-i and as marker the image mi. 
The resulting levelings inherit in this case the semi-group property of the markers 

mi original li 

[17]. The illustrations are disposed as follows: m3 original I 3 

mg original I 5 

5 Conclusion 

A morphological scale space representation has been presented, with all desirable 
features of a scale space. It has been applied with success in order to reduce 
the bitstream of an MPEG-4 encoder, when the simplified sequence replaces 
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image 





leveling 1 leveling 2 



leveling 3 




Fig. 2. Three different levelings, applied to the same reference and marker image. 




Leveling 



Leveling 



Fig. 3. A same leveling applied to the same reference image with distinct marker images 
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Fig. 4. Illustration of a multiscale representation 
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the original sequence. In this case, a sliding temporal window is processed and 
treated as a 3D volume, with 2 spatial dimensions and one temporal dimension: 
3D markers and 3D levelings are then used. Another important application is 
the simplification of the images prior to segmentation. Since the levelings enlarge 
flat zones, these flat zones may be used as seeds for a segmentation algorithm. 
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Abstract. The partial differential equations describing the propagation 
of (wave) fronts in space are closely connected with the morphological 
erosion and dilation. Strangely enough this connection has not been ex- 
plored in the derivation of numerical schemes to solve the differential 
equations. In this paper the morphological facet model is introduced in 
which an analytical function is locally fitted to the data. This function is 
then dilated analytically with an infinitesimal small structuring element. 
These sub-pixel dilationsform the core of the numerical solution schemes 
presented in this paper. One of the simpler morphological facet models 
leads to a numerical scheme that is identical with a well known classical 
upwind finite difference scheme. Experiments show that the morpholog- 
ical facet model provides stable numerical solution schemes for these 
partial differential equations. 



1 Introduction 

The partial differential equations describing the propagation of fronts in space 
are known to be closely connected with the morphological erosion and dilation. 
These morphological partial differential equations (henceforth abbreviated as 
PDE’s) known from the work of Alvarez [1], Maragos[2], Matiolli [3] and van 
den Boomgaard [4], have gained considerable interest in the past as canoni- 
cal descriptions of evolutionary shape deformation (see Osher and Sethian [5], 
Sapiro [6] and Kimia [7]). Strangely enough the realization that these PDE’s are 
solved with morphological operations has not been explored in the development 
of numerical schemes to solve these differential equations. This paper is a more 
detailed paper building on a previous paper [8] in which we have shortly intro- 
duced the morphological facet model as a tool to construct numerical schemes to 
solve these PDE’s. This paper deals with the subject in more detail. 

In the morphological facet model an analytical function is locally fitted to the 
data. This function is then dilated analytically with an infinitesimal small struc- 
turing element. These sub-pixel dilations form the core of the numerical solution 
schemes. One of the simpler morphological facet models leads to a numerical 
scheme that is identical with a well known classical upwind finite difference 
scheme. 

Consider the parameterized shape contour C{p,t) as function of the path 
parameter p and “time” parameter t. The generic evolution of shape as a function 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 199—210, 1999. 
@ Springer- Verlag Berlin Heidelberg 1999 




200 



R. van den Boomgaard 




Fig. 1. Contour versus function evolution. 



of time is given by: 



dC 



r{K)N 



where F is a, function of the curvature k and N is the inwards pointing normal 
to the curve. The choice of T = — 1 is equivalent with the dilation of the shape 
with a disk of radius t. A pure local description of the dilation (as in the above 
PDE) leads to self-intersecting curves. Dorst and van den Boomgaard [9] used 
this local geometrical description as their definition of the tangential dilation. 
The classical morphological dilation corresponds with the entropy solution of 
the PDE (i.e. the solution without self-intersections). 

A robust way of obtaining entropy solutions is to embed the curve as a level 
set in a function and solve the associated PDE that describes the evolution of the 
function in time. Let E be a function of space (parameter x) and time (parameter 
t) and let some level set of F at time t = 0 correspond with the original curve C. 
It can be easily shown that the evolution of the function F such that the level 
set behaves in time as if the ‘curve PDE’ is solved, is given by: 

^ = -F{n)\\VF\\. 

Note that the embedding of the curve is chosen in such a way that the shape 
is characterized with high function values. In that case the gradient vector is 
indeed the inwards pointing normal. For F = — 1 we recognize the PDE that 
is solved by dilating the initial condition (the function F at time t = 0) with 
a flat disk shaped structuring element of radius t. For arbitrary, but positive, 
F, the PDE can be interpreted as geometry controlled dilation. It should be 
noted that such a spatially variant dilations is completely within the scope of 
the morphological (complete lattice) framework (see Heijmans [10]). 

When a 2D curve is embedded in a 2D function a remarkable thing happens. 
The geometry of the curve is not measured in the spatial domain alone, but the 
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smoothness of the embedding function is used to measure the geometry of the 
curve through its function derivatives. Whereas in the curve representation the 
points on the curve are moved to a new position in the same (horizontal) plane, 
in the embedding function the points are moved in vertical direction (i.e. the 
function value is changed). This is only allowed in case the required smoothness 
of the embedding function (which is a mathematical construct) is guaranteed 
through out the evolution process. 

In this contribution we will not look at the numerous applications of these 
types of morphological PDE’s in the computer vision context. Instead we will 
concentrate on numerical schemes to find solutions. In the mathematical litera- 
ture, the derivation of robust and stable numerical schemes is complex and relies 
on the analysis of the conservation law properties of the PDE. 

In [4] it was shown that the morphological dilation, out of all possible solu- 
tions (note that these type of PDE’s do not have a unique solution), selects the 
entropy solution (which is unique). With this observation in mind, this paper 
introduces the morphological facet model as an elegant method to derive robust 
and stable numerical schemes to solve these PDE’s. To allow for small time steps 
in the solution, corresponding with small radii of the dilation disk, the morpho- 
logical facet model facilitates sub-pixel dilations. In section 2 a short introduction 
to morphological PDE’s is given. In section 3.1 two classical numerical schemes 
for solving these PDE’s are given for reference and comparison. In section 3.2 the 
morphological facet model is introduced. One of the morphological facet models 
leads to a numerical scheme that is equivalent to a classical scheme. In section 
3.3. some numerical experiments are presented. 



2 Morphological PDE’s 

In this section a short introduction to the morphological PDE’s is given, a more 
detailed description can be found in [4]. Consider the one-parameter family of 
images F obtained by dilating a function / with structuring function g* for 
varying t: 



F{x,t) = (/©ff‘)(x) 

with g a concave function and g* the umbral scaling of g defined as g*(x) = 
tg{x/f). In umbral scaling not only the spatial dimensions are scaled with a 
factor t but also the grey value dimension is scaled. Essentially the graph of the 
function, interpreted as a geometrical object, is scaled. 

In this section it will be shown what happens if we change the scale from t 
to t -|- dt. Because umbral scaling of any concave function forms a semi group 
under dilation (i.e. g^ (B g^ = g^^^) we can write: 

F{x,t + dt) = {f®g*+‘^^){x) 

= ((/© 5 ‘)©/‘)(^) 

= (E(.,t)©/‘)(x) 
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Fig. 2. Dilating a planar function. The vertical shift when dilating a planar function 
is given by the slope transform of the structuring function. 



As we are interested in the case that dt is infinitesimal small, meaning that 
becomes very sharply pointed and indeed looks like the morphological pulse, we 
may approximate the function locally around the point x, with its tangent 

line. 

Planar functions are the eigenfunctions of mathematical morphology. Dilat- 
ing (eroding) a planar function with any structuring function g results in a 
planar function with the same slope, it is only shifted in space. The vertical shift 
is equal to the intercept with the function axis (this is illustrated in figure 2) 
of the tangent plane in the point on the structuring function where the slope 
equals the slope of the plane to be dilated. This geometrical construction (for all 
tangent planes to the function g) gives the Legendre transform of the (concave) 
function g. The generalization of the Legendre transform to arbitrary functions 
is called the slope transform [9]. 

The dilation of a planar function eui{x) = ui ■ x + c is therefore equal to 
Gui® g = eu) + S'[(/](u;), where S[g] is the slope transform of g. In the case of the 
tangent plane to the function F in the point x, we obtain: 

F{x,t + dt) = F{x,t) + S[g'^*]{'VF{x,t)) 

(note that \7F{x,t) is the ‘slope’ of the tangent plane). In [9] it is proven that 
umbral scaling in the spatial domain amounts to grey value scaling (i.e. multi- 
plication with a constant) in the slope domain. Thus we have: 

F{x, t + dt) = F{x, t) + dtS[g]{V F{x, t)). 



Rearranging terms and in the limit dt — >■ 0: 



— {x,t) = S[g]{VF{x,t)). 



This analysis shows that the family of images generated by dilation with the 
umbral scaling of a concave structuring function is causal in the scale-parameter. 
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I.e. the change in grey value going from scale t to scale t + dt is determined by 
the (first order) differential structure of the image at scale t. 

In summary, the entropy solution of the PDE dF/dt = G{VF) with initial 
condition F{x, 0) = f{x) is given by the dilation F{x,t) = (/ ® where G 

is the slope transform of g. 

As an example consider the PDE ^ = ||VE|| . The inverse slope transform 
of G{u}) = ||w|| is the ‘indicator’ function g,B- 



9b{x) = 



0 



< 1 



— oo : elsewhere 



Note that the slope transform can be applied to non-differentiable functions like 
/x_B (for details see [9]). The PDE dF/dt = ||VE|| is thus solved with a dilation 
of / using a disk shaped flat structuring element of radius t. This PDE is often 
encountered not only in morphological image processing where the disk shaped 
structuring element has radius greater than the pixel size, but also in non-linear 
curvature controlled deformation of shape boundaries. Here the radius of dilation 
is controlled by the curvature of the boundary. The dilation to get from E(-, t) to 
F{-,t+dt) uses a disk with infinitely small radius controlled by the local (position 
dependent) geometry. This observation already hints at numerical schemes to 
solve the PDE: dilations with disk smaller than the pixel distance (sub-pixel 
morphological operators). Another example of the use of sub-pixel dilations is 
in its use in geometrical measurements where the difference of a shape and a 
dilated version is proportional to certain geometrical properties of the shape [11]. 

As a second example, consider the PDE ^ = ||VE|| , with initial condition 
F{x,0) = f{x). The inverse slope transform of G(w) = |jw|]^ is the quadratic 
structuring function g(x) = — ||a:||^/4, i.e. the PDE is solved with dilations 
using a quadratic structuring element of scale t. This PDE is the morphological 
equivalent of the linear diffusion equation [4] . 



3 Numerical solutions 



In this section we look at numerical schemes to solve the PDE 

f = 

with initial condition F(x, 0) = f{x). Only forward Euler schemes will be consid- 
ered. Let Fij^r = F{iAx,j^y, xAt), then the forward Euler numerical difference 
scheme is given by: 

F.,xr+i = Fi,j,r + At^ + {DF^Ff 

where Dfj^F is the finite difference approximation to Fx{iAx,jAy,rAt). The 
choice of these finite difference operators proves to be the crucial step. Simple 
central differences like 



■ F = 

i,j,r 



( pr pr \ 



2Ax 



(2) 
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do not work. Even in case very small time steps At are used, stability is not 
guaranteed. 



3.1 Upwind finite difference schemes 

Based on the analysis of the PDE (especially the fact that it expresses a con- 
servation law and the fact that we are looking for an entropy solution) several 
upwind finite difference schemes are presented in the literature. The simplest 
one is given by Osher and Sethian[5]: 



+ At^ + {D+l^Ff + (3) 

where: 

D-^F = V 0, D+l^F = ^ ^ 



Ax 



Here we use the morphological convention to denote the supremum (maximum) 
operator with V. Equivalent expressions are used for the directed differences in 
y-direction. A second finite difference scheme to solve the same PDE is due to 
Rouy and Tourin (as cited in [12]): 



FiJ^r+l Fijy 



V D+lrFf + V DllFf. (4) 



It is not within the scope of this paper to give the derivation of these upwind 
difference schemes. Instead in the following section it will be shown that the 
upwind schemes are in complete accordance with the schemes that implement 
the sub-pixel morphological operations. 



3.2 The Morphological Facet Model 

From a morphological point of view it is not surprising that the classical finite 
difference schemes needed to solve ‘morphological PDF’s’ contain max and min 
functions. In this section it will be shown that finite difference schemes that are 
identical to the schemes cited in the previous section, can be easily derived start- 
ing from the fact that the PDE is actually solved by a morphological dilation^. 

In the previous section it was already stated that the operation to derive 
F{x, t+At) given F{x,t) is to dilate the function F{-,t) with a disk of radius t. In 
this section we consider the dilation of a function with a disk shaped structuring 
element of radius t < 1. For these small values of the radius a sampled version 
of the disk is of no use as only the origin is a grid point. To be able to dilate the 
sampled function we therefore propose the morphological facet model: 

^ The use of a morphological dilation to solve these type of differential equations is 
certainly not new. Burgers [13] himself presented a geometrical construction to solve 
‘his’ PDE, which would nowadays be immediately recognized as a morphological 
dilation (using a parabolic structuring function). 
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a b c d 



Fig. 3. Morphological facet models, a: 4-beam, b: 8-beam, c: 4-plane and d: 8- 
plane. 



— approximate the discrete data in a small neighborhood with a function, then 

— dilate this function analytically and, finally, 

~ sample the dilated function to give the final result. 

We use the term morphological facet model because of its resemblance to the 
facet model which is used to approximate the derivatives of a sampled function 
(see Haralick[14]). Any facet model is characterized by: 

— the analytical function that is fitted to the data, and 

~ the size and shape of the neighborhood from which the data is considered in 
the fitting. 

The function used in the local approximation of the function data has to be 
chosen in such a way that the desired operation (calculating the derivatives 
in the classical facet model and dilation in the morphological facet model re- 
spectively) can be calculated analytically. Whereas in the classical (linear) facet 
model the function needs to be differentiable (its sole purpose is to calculate the 
derivatives), in the morphological facet model a crucial requirement is that the 
local range of function values is preserved. If this would not be the case then 
the dilation of the approximated function could result in function values that 
cannot be “explained” by function values in the sampled data. This is exactly 
the main problem when ‘solving’ the morphological PDE’s with a simple linear 
finite difference scheme that is based on a facet model that does not obey the 
range requirement. Differentiability of the function is not a primary concern; 
dilations tend to result in non differentiable functions anyway. Even continuity 
of the function is not of primary concern in the morphological facet model. 

The first morphological facet model considered is a degenerated facet model. 
Instead of fitting a surface to the data points , just the “beams” between the cen- 
tral data point and the neighboring points are considered. Two beam models are 
distinguished. The 4-beam model considers only the beams to the 4-connected 
neighbors. The 8-beam model also considers the beams to the 8 connected neigh- 
bors. Both beam models are illustrated in figure 3. In dilating the facet model, we 
are only interested in the dilation value in the central pixel. Evidently the final 
dilation result is the maximum of the dilations of the individual beams. Consider 
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the first beam connecting the central grid point with value with the first 

neighbor with value j In case ^ the dilation result in the 

central pixel is j i.e. the disk hits the beam in its highest point (the origin). 
In case the other end of the beam is the highest point (i.e. Fi+ij^r > Fi,j,r) 
the dilation value equals Fi^ij^r+dr = Fij^r + ^t(.Fi+i,j,r — Fij^r)- The selec- 
tion according to the ordering of Fij^r and Fi^ij^r can elegantly be casted in a 
maximum operator: 

Fi-\-lJ^r+dr — Fi j j- F (0 V • 

An equivalent analysis can be done for all 4-connected neighbors, leading to the 
following morphological finite dilation scheme: 

~ F-i j j- F At \J . ( 5 ) 

{k,l)eN4. 

The beam model is easily extended to take all 8 neighbors into consideration. 
For the diagonal neighbors a distance correction is needed then. This leads to: 

Fi-\-ij^r+dr ~ F^ j ,^ F At \J w{k^Fj F) , (6) 

{k,e)eNs 

where w{k, £) = 1 /-\/2 for the diagonal neighbors and 1 for the other points in the 
8-connected neighborhood. Note that it is essential that in the above maximum 
also the central pixel itself (i.e. (k,£) = (0,0)) is considered as it provides the 
necessary positivity of the dilation offset. 

More complex morphological facet models are obtained when interpolating 
planar surfaces are used as shown in figure 3 c and d. The 4 plane model inter- 
polates the data points with 4 planar function patches, each of them defined in 
one of the four octants. Dilation of the facet model is then equivalent with the 
maximum of the 4 dilations of the individual triangular facets. In section 2 it 
was indicated that dilating a plane with a disk is equal to the addition of the 
gradient norm. Because in the planar facet model only a small triangular patch 
of the plane is dilated, we have to make sure that the ‘point-of-contact’ is indeed 
within that patch. Let pjj be the planar function in the first quadrant: 

1 ( \ / {Fi+ij^r - Ti.i.r) X F {Fij+i^r ~ T* j,r) 2/:a;>0,y>0,a;-|-y<l 

Pi,j\X,y) _oo : elsewhere 



The dilation result of this patch with a disk of radius t, in the central point is 
given by: 



0 V (Aj-i-i j' Ajj' ,.) V Tlt,j,r) ■ elsewhere 



where * indicates the condition that Fi+i — Fi > 0 and ~ > 0- 

Thus, in case that the point-of-contact is within the first quadrant, the dilation 
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adds the gradient norm to the function value. When the gradient vector points 
outside the first quadrant, the disk will hit the triangular planar patch at one of 
the two ‘beams’. The above expression can be simplified to: 

“t” 

The dilation of the entire 4-plane facet model is equal to the maximum of the 4 
individual dilations: 




j Z\t( 




Careful analysis^ of the above equation reveals that it is equivalent with the 
Rouy and Tourin scheme (equation 4): 



Pi,j,r+1 — Pi,j,r + ((0 V ~ Pi,j,r V j + 

(0 V Fij+i^r — Fij^r V Fij-i^r ~ Fij^r)'^) ^ . 



The extension of the 4-plane model to the 8-plane model is straightforward.. In 
this case the 8 planar patches are defined within an octant, making the check to 
see whether the point-of-contact is within the region of definition more complex. 



3.3 Numerical Experiments 

The experiments presented in this section are meant to illustrate the concepts 
developed in previous sections. The morphological numerical schemes are com- 
pared with the classical Rouy and Tourin scheme. More detailed experiments 
concerning stability and accuracy are the subject of future research. 

In figure 4 the experiment is shown in which a pulse (in a 64 x 64 image) 
is the initial condition to the PDE given in equation (1). As explained in the 
previous section, the Rouy and Tourin (abbreviated as the R&T) upwind scheme 
is equivalent to the morphological 4-plane facet model. From the figures it is clear 
that the morphological beam models suffer from severe anisotropy and therefore 
are of little practical use. 

^ To prove the equivalence remember that ^/aVy/b = \faFFb, and that (a+6) V(a-l-c) = 
a -I- (b V c) for positive a, b and c. Using these equalities the proof is completely 
straightforward . 
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Fig. 4. Sub-pixel dilation of a pulse. In a(A) the original 64 x 64 initial condition 
of the PDE is shown. The PDE is solved with 5 numerical schemes. In b(B) the R&T 
scheme is depicted. In c(C) the 4-beam facet model, in d(D) the 8-beam model, in e(E) 
the 4-plane model and in f(F) the 8-plane model. 



Comparison of the morphological 4-plane and 8-plane models, learns that 
whereas the 8-plane model is the most isotropic solution, it is so at the cost of 
being more dissipative (i.e. more ‘smoothing’ is introduced). An advantage of the 
8-plane model is that the scale step Z\tcan be chosen significantly larger than 
for the 4-plane model. For the 8-plane model we have AtjAx < costt/8 « 0.92 
compared with AtjAx < costt/4 « 0.71 for the 4-plane model. These bounds 
follow from the observation that stability requires that the disk really hits one 
of the planes as defined in the small considered neighbourhood (and not the 
analytical continuation). The 8-plane model therefore can be used with larger 
time steps, leading to more efficient solutions schemes as fewer iterations are 
needed. 

Figure 5 depicts the second experiment in the same layout, only the initial 
condition was changed. This time a smooth function (the function ‘peaks’ from 
matlab: a weighted addition of several Gaussian functions) is used as initial con- 
dition. Again we observe that the morphological beam models perform poorly, 
whereas any differences between the 4-plane and 8-plane model are hardly no- 
ticeable. 

Figure 6 finally shows the experiment where noise has been added to the 
smooth function that has been used in the previous experiment. This experiment 
shows that smoothness of the functions is not of any influence to the stability of 
the numerical solution schemes. 

4 Conclusions 

In this paper we have introduced the morphological facet model as a method to 
implement sub-pixel morphological dilations (and of course also erosions) and 
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Fig. 5. Sub-pixel dilation of a continuous and smooth function. In a(A) the 
original 64 x 64 initial condition of the PDE is shown. The PDE is solved with 5 
nnmerical schemes. In b(B) the R&T scheme is depicted. In c(C) the 4-beam facet 
model, in d(D) the 8-beam model, in e(E) the 4-plane model and in f(F) the 8-plane 
model. 





Fig. 6. Sub-pixel dilation of a function with a substantial amount of noise 
added. In a(A) the original 64 x 64 initial condition of the PDE is shown. The PDE 
is solved with 5 nnmerical schemes. In b(B) the R&T scheme is depicted. In c(C) the 
4-beam facet model, in d(D) the 8-beam model, in e(E) the 4-plane model and in f(F) 
the 8-plane model. 
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thus to provide a stable numerical scheme to solve the class of morphological 
PDE’s. The morphological numerical scheme based on 4-plane facet model proves 
to be equivalent with the classical upwind numerical scheme of Rouy and Tourin. 

Future research will look at the PDE’s where the dilation/erosion is locally 
controlled by the observed geometry of the shape (i.e. its curvature). The sim- 
plest way to use the morphological schemes described in this paper is to use 
a classical facet model (e.g. bicubic) to observe the local differential geometry 
(or use Gaussian (fuzzy) derivatives) and calculate the curvature and then to 
use a morphological facet model to perform the sub-pixel erosion/dilation. A 
more unified approach is to look for facet models that allow both analytical 
morphological operations as well as differentiation. 
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Abstract. We show that regularization methods can be regarded as 
scale-spaces where the regularization parameter serves as scale. In analogy 
to nonlinear diffusion filtering we establish continuity with respect to 
scale, causality in terms of a maximum-minimum principle, simplifica- 
tion properties by means of Lyapunov functionals and convergence to a 
constant steady-state. We identify nonlinear regularization with a single 
implicit time step of a diffusion process. This implies that iterated regu- 
larization with small regularization parameters is a numerical realization 
of a diffusion filter. Numerical experiments in two and three space dimen- 
sions illustrate the scale-space behaviour of regularization methods. 



1 Introduction 

There has often been a fruitful interaction between linear scale-space techniques 
and regularization methods. Torre and Poggio [28] emphasized that differentia- 
tion is ill-posed in the sense of Hadamard, and applying suitable regularization 
strategies approximates linear diffusion filtering or - equivalently - Gaussian 
convolution. Much of the linear scale-space literature is based on the regu- 
larization properties of convolutions with Gaussians. In particular, differential 
geometric image analysis is performed by replacing derivatives by Gaussian- 
smoothed derivatives; see e.g. [8,14,25] and the references therein. In a very 
nice work, Nielsen et al. [15] derived linear diffusion filtering axiomatically from 
Tikhonov regularization, where the stabilizer consists of a sum of squared deriva- 
tives up to infinite order. 

Nonlinear diffusion filtering can be regarded both as a restoration method 
and a scale-space technique [10,19,29]. When considering the restoration prop- 
erties, natural relations between biased diffusion and regularization theory exist 
via the Euler equation for the regularization functional. This Euler equation can 
be regarded as the steady-state of a suitable nonlinear diffusion process with 
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a bias term [5,18,24]. A popular specific energy functional arises from uncon- 
strained total variation denoising [1,3,4]. Constrained total variation also leads to 
a nonlinear diffusion process with a bias term using a time-dependent Lagrange 
multiplier [21]. 

When regarding nonlinear diffusion as a scale-space method we have to ensure 
that architectural, invariance and simplification properties exist [2]. A typical 
architectural property is continuity with respect to the scale parameter, a charac- 
teristic invariance property is the average grey level invariance, and simplification 
qualities can be stated in terms of a maximum-minimum principle, Lyapunov 
functionals and convergence to a constant steady-state [29] . 

Strong and Chan [27] proposed to regard the regularization parameter of 
total variation denoising as a scale parameter. However, a corresponding scale- 
space interpretation of regularization methods, which is in analogy to results 
for nonlinear diffusion scale-spaces, has been missing so far. This topic will be 
discussed in the present paper. We show that there exists a scale-space theory for 
regularization methods which resembles very much the one for nonlinear diffu- 
sion filtering. Following [12,22,27] we interpret the regularization parameter as 
a diffusion time by considering regularization as time-discrete diffusion filtering 
with a single implicit time step. Consequently, iteration of regularization with 
small regularization parameters can be regarded as an approximation to diffusion 
filtering. 

Our paper is organized as follows: In Section 2 we survey scale-space proper- 
ties of diffusion filtering. In Sections 3 and 4 an analogous theory for noniterated 
and iterated regularization techniques is established. Due to the lack of space we 
can survey only the main results. Proofs and full details can be found in technical 
reports [23,20]. In Section 5 we present some experiments with 2D MR images 
and 3D ultrasound data, and compare the restoration properties of noniterated 
and iterated regularization. 

2 Diffusion Filtering 

In this section we review essential scale-space properties of nonlinear diffusion 
filtering. The presented results can also be extended to a broader class of methods 
including regularized filters with nonmonotone flux functions and anisotropic 
filters with a diffusion tensor. More details and proofs can be found in [29]. 

We consider a diffusion process of the form 

{ dtu{x,t) = V. (( 7 (|Vm|^)Vu) {x,t) on 17 X [0, oo[ 

dnu{x,t) = 0 onCx[0, oo[ (1) 

u{x, 0) = f{x) on 17 . 

The image domain 17 C is assumed to be bounded with piecewise Lip- 
schitzian boundary T with unit normal vector n, and / G L°°(17) is a degraded 
original image with a := ess inf / and b := ess sup /. 

The diffusivity g satisfies the following properties: 
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1. C-([0,oo)) ^ 

2. The flux g{s‘^)s is monotonically increasing in s. 

3. g{s) > 0 for all s > 0. 

Under these assumptions there exists a unique solution u{x,t) of (1), such that 
||m(., t) is continuous for t > 0. This continuity property is necessary for 
relating structures over scales and for retrieving the original image for t — >■ 

0. It is one of the fundamental architectural ingredients of scale-space theory. 
Furthermore, it is possible to show that u{x,t) G C°°{f2 x (0,oo)). 

Moreover, the average grey level is conserved: 

-j-^ f u{x, t)dx = Mf for alH > 0 , 

JQ 



with 

A constant average grey level is essential for scale-space segmentation algorithms 
such as the hyperstack [16]. It is also a desirable quality in medical imaging 
where grey values measure physical quantities of the depicted object, for instance 
proton densities in MR images. 

The unique solution of (1) fulfills the extremum principle 



a < u{x,t) < b on 17 x (0,T]. 



(2) 



The extremum principle is an equivalent formulation of Koenderink’s causality 
requirement [11]. Together with the continuity it ensures that level sets can be 
traced back in scale. 

Another important simplification property can be expressed in terms of 
Lyapunov functionals. For all r G C^[a, b] with r” > 0 on [o, b], the function 



V (t) := <j){u{t)) := I r{u{x,t))dx (3) 

J n 

is a Lyapunov functional since it satisfies 

1. (j){u{t)) > 4>{Mf) for alH > 0 

2. a) UGC'[0,oo)nC^(0,oo) 
b) V'{t) < 0 for all t > 0. 

Lyapunov functionals show that diffusion Alters create simplifying transforma- 
tions: the special choices r(s) := [sl^*, r(s) := (s — M/)^” and r(s) = sln(s), 
respectively, imply that all norms with p >2 are decreasing, all even central 
moments are decreasing, and the entropy S'[u(t)] := — u(x, t) In u(x, t) dx, a 
measure of uncertainty and missing information, is increasing with respect to 
t. Lyapunov functionals have been used for scale-selection and texture anal- 
ysis [26], for the synchronisation of different diffusion scale-spaces [16], and for 
the automatic determination of stopping times [31]. Moreover, they allow to 
prove that the Altered image converges to a constant image as t tends to oo: 
limt_>oo \\u{t) — M/j[iP(i 7 ) = 0 for p G [l,oo). For d = 1 we have even uniform 
convergence. 
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3 Regularization 

An interesting relation between nonlinear diffusion filtering and regularization 
methods becomes evident when considering an implicit time discretization 
[12,22,27]. The first step of an implicit scheme with step-size h in t-direction 
reads as follows. 

^ (g(IVnp)Vz.) {X,h) 

dnu{x, h) = 0 (4) 

■«(a;,0) = f{x) . 

In the following we assume the existence of a differentiable function g on [0, oo) 
which satisfies g' = g. Then the minimizer of the functional 

T{u) :=\\u- f\\l 2 ^Q)+h f g{\Vu\^)dx (5) 

J 

satisfies (4) . This can be seen by calculating the formal Gateaux derivative of T 
in direction v, i.e. 

(T'{u),v)= lim — ^ ^ [ 2{u— f)v dx+h [ 2g(\'Vu\'^)'VuS/v dx. 
t^o+ t Jq Jq 

Since a minimizer of (5) satisfies {T'(u),v) = 0 for all v, we can conclude that 
the minimizer satisfies the differential equation (4) . If the functional T is convex, 
then a minimizer of T is uniquely characterized by the solution of equation (4) . 

T{u) is a typical regularization functional consisting of the approximation 
functional ||m — /||^ 2 (j 7 ) and the stabilizing functional g(|Vup) dx. The weight 
h is called regularization parameter. An extensive discussion of regularization 
methods can be found in [7]. 

Now we present a scale-space theory for a broad class of regularization 
methods. For proofs and full details we refer to [23]. Let g satisfy: 

1. g{.) is continuous for any compact K C [0,oo). 

2. g{0) = min {g{x) : x € [0, oo)} > 0. 

3. 5(1.1^) is convex from to R . 

4. There exists a constant c > 0 such that g{s) > cs. 

5. g is monotone in [0, oo). 

These assumptions guarantee existence and uniqueness of a minimizer Uh for the 
regularization functional (5) in the Sobolev space 77^(17). 

They are satisfied for the following regularization techniques: 

1. Tikhonov regularization: 

2. The modified total variation regularization of Ito and Kunisch [13]: 

= \Zls]2 -I- with a > 0 . 
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3. The modified total variation regularization of Nashed and Scherzer [17]: 

g(|s|2) = V|sP + /32 + a|s|2. 

4. The regularization of Geman and Yang [9] and Chambolle and Lions [3]: 









S < e 



e < |s| < t 



|s| > 

5. Schnorr’s [24] convex nonquadratic regularization: 









|s| < Cp 



Xf\s\^ + {Xl-Xf)c,{2\s\-c,) 



s > c 



p ■ 



The assumption 4. on g is violated for the total variation regularization in its 
original formulation by Rudin et al. [21]. In this case our mathematical frame- 
work cannot guarantee existence of a minimizer of (5) in and in turn we 

have no existence theory for the partial differential equation (4). However, this 
does not mean that it is impossible to establish similar results by using other 
mathematical tools in the proofs. 

The functional 1 ]u/i|]l 2(^2) can also be shown to be continuous in h > 0. 
Regarding spatial smoothness, the solution belongs to iJ^(l7). This result is 
weaker than for the diffusion case. 

In analogy to diffusion filtering, the average grey level invariance 

Ufidx= f dx for all h > 0 

Jn Jn 

and the extremum principle 

a < Uh < b for all > 0 



can be established. 

Moreover, Lyapunov functionals for regularization methods can be constructed 
in a similar way. For all r G (^^[a, b] with r" > 0, the function 

V {h) := (j){uh) ■■= [ r{uh{x))dx (6) 

Jo 

is a Lyapunov functional: 

1. (j){uh) > 4>{Mf) for all h>0. 

2. a) V G C[0,oo), 

b) DV{h) := J^r'{uh){uh — uq) < 0, for all h>0. 

c) V{h) — R(0) < 0 for all h>0. 
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Here, a difference between Lyapunov functionals for diffusion processes and 
regularization methods becomes evident. For Lyapunov functionals in diffusion 
processes we have V'{t) < 0, and in regularization processes we have DV{h) < 0. 
DV{h) is obtained from V'(t) by making a time discrete ansatz at time 0. We 
note that this is exactly the way we compared diffusion filtering and regular- 
ization techniques. It is therefore natural that the role of the time derivative in 
diffusion filtering is replaced by the time discrete approximation around 0. 

Again, these Lyapunov functionals allow to prove convergence of the filtered 
images to a constant image as /i — >■ oo. For d = 3, however, the convergence 
result is slightly weaker than in the diffusion case. 

d = 1 : Uh converges uniformly to M f for — >■ oo 
d = 2 : lim \\uh - = 0 for any 1 < p < oo 

h—yoo 

d = 3 : lim \\uh - Mf\\Lpm\ = 0 for any 1 < p < 6 

h—yao 

4 Iterated Regularization 

Regularization can be applied iteratively where the regularized solution of the 
previous step serves as initial image for the next iteration. For small regulariza- 
tion parameters, iterated regularization becomes therefore a good approximation 
to a nonlinear diffusion filter. 

Let us consider an iterative regularization process with positive regularization 
parameters hk, k = 1, ...,oo, such that the corresponding “diffusion time” t„ := 
X)fe=i tends to oo for n — >■ oo. The n-th iteration reads as follows: 



= V. (g(|Vwp)Vu) (x,t) t G xG n 

t — tn-1 

' dnU{x, t) = 0 X G r 

u{x, 0) = f{x) X G f2 

(7) 

where now t—tn-i serves as the regularization parameter in the interval t„]. 

It is now possible to prove a similar scale-space theory as for noniterated 
regularization [20]. The main results are given below. 

Under the same assumptions as for the noniterated case there exists a unique 
minimizer Moreover, the functional is continuous for t > 0. 

However, the spatial smoothness becomes better in each iteration step: after 
n iterations the solution u{.,t) is in the Sobolev space for fixed t G 

(tn-i,tn] (provided the diffusivity g is sufficiently smooth). This suggests that, 
if one uses the regularized solution for calculating derivatives of order 2n, one 
should perform at least n iterations. 

As for noniterated regularization, the average grey level invariance and the 
extremum principle hold. 
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Even at the risk of boring the reader we introduce for the sake of completeness 
Lyapunov functionals for iterated regularization: for all r G C'^[a,b] with r" > 0, 
the function 

V{t) := := j r{u{x,t))dx (8) 

J n 

is a Lyapunov functional: 

1. 4>{u{.,t)) > <j){Mf) for all t > 0. 

2. a) E G ^[0,00), 

b) DV{t) := r' (u(x,t)) (u(x,t) — u(x,tn-i))) dx < 0, 
for all t G 

c) V{t) -V{tn-i) < 0 for all t G 

In contrast to noniterated regularization, Lyapunov functionals for iterated 
regularization methods are based on the time discrete ansatz at t Never- 

theless, the convergence results from Section 3 carry over literally. 

5 Experiments 

The numerical experiments are performed using the software package Diffpack 
from the University of Oslo / Numerical Objects [6]. We have implemented the 
diffusion equation with g(|VMp) = i/lV^up + (3'^ + a|Vup which is a modified 
total variation regularization. For this diffusion filtering our theoretical results 
are applicable. The term alVup is only of theoretical interest; in numerical 
realizations, the discretized version of the gradient is bounded, and there is no 
visible difference between using very small values of a (in which the theoretical 
results are applicable) and a = 0 (where our theoretical results do not hold) . 

Our experiments were carried out for different sequences of time-steps and 
various smoothing parameters fd. The influence of the parameter settings is as 
follows. 

The impact of fd on the numerical reconstruction is hardly viewable in the 
range from (d = 10“^ to 10“^. Even the convergence rate is, although slower for 
smaller (d, hardly affected. 

For small values of regularization parameters h (up to approximately 5.0), 
there is no visible difference between iterated and noniterated regularization. 
The effect can only be seen for larger values of h. This is illustrated in Figure 2. 
It shows the result of noniterated and iterated regularization applied to the 2D 
MR image from Figure 1(a). The results are depicted at times t = 10, 30, and 
100, respectively. For noniterated regularization this is achieved in one step, and 
for iterated regularization the regularization parameter h = I was chosen and 
10, 30, or 100 iterations were performed. We observe that differences between 
the two methods are very small. They only become evident when subtracting 
one image from the other. This also indicates that even the semigroup property 
of regularization methods is well approximated in practice. It should be noted 
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Fig. 1. Test images, (a) Left: MR image with additive Gaussian noise 
{SNR — 1). (b) Right: Rendering of a three-dimensional ultrasound data 
set of a human fetus. 



that the semigroup property is an ideal continuous concept which can only be 
approximated in time-discrete algorithms for partial differential equations. 

As can be seen from the previous sections, the scale-space framework for 
noniterated and iterated regularization methods carries over to higher space 
dimensions. In the next figure we present results from a three-dimensional ultra- 
sound data set of a fetus with 80 x 80 x 80 voxels. Also in this case the differences 
between noniterated and iterated regularization are very small and iterated regu- 
larization appears to give slightly smoother results. This is in complete accor- 
dance with the theory derived in the previous sections. 



6 Conclusions 

Traditionally, scale-space techniques have been linked to parabolic or hyperbolic 
partial differential equations [2]. The novelty of our paper consists of estab- 
lishing a parameter dependent elliptic boundary value problem (noniterated 
regularization) and a sequence of elliptic problems (iterated regularization) as 
scale-space techniques. They satisfy the same scale-space properties as nonlinear 
diffusion filtering. The key ingredient for understanding this relation is the inter- 
pretation of regularization methods as time-implicit approximations to diffusion 
processes. In this sense, the scale-space theory of regularization methods is also 
a novel semi-discrete theory to diffusion filtering. This time-discrete framework 
completes the theory of diffusion scale-spaces where up to now only results for 
the continuous, the space-discrete and the fully discrete setting have been formu- 
lated [29]. 

The synthesis of regularization techniques and diffusion methods may lead to 
a deeper understanding of both fields, and it is likely that many more results can 
be transferred from one of these areas to the other. It would e.g. be interesting 
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one iteration, t=10 10 iterations, t=10 difference 



one iteration, t=30 30 iterations, t=30 difference 



one iteration, t=100 100 iterations, t=100 difference 

Fig. 2. Results for the MR image from Figure 1(a) with noniterated and iterated 
regnlarization (/? = 0.001). The left colnmn shows the results for noniterated, the 
middle column for iterated regularization. The images in the right column depict the 
modulus of the differences between the results for the iterated and noniterated method. 
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1 iteration, t=20 10 iterations, t=20 



Fig. 3. Results for the three-dimensional ultrasound data from Figure 1(b) with f3 = 
0.001. The left column shows the renderings for noniterated, the right column for 
iterated regularization. The regularization parameter for iterated regularization was 
h = 2. 
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to study how results for optimal parameter selection in regularization methods 
can be used for diffusion filtering. It is also promising to analyse and juxtapose 
efficient numerical techniques developed in both frameworks. First steps in this 
direction are reported in [30]. 
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Abstract. Nonlinear diffusion methods have proved to be powerful 
methods in the processing of 2D and 3D images. They allow a denoising 
and smoothing of image intensities while retaining and enhancing edges. 
On the other hand, compression is an important topic in image process- 
ing as well. Here a method is presented which combines the two aspects 
in an efficient way. It is based on a semi-implicit Finite Element im- 
plementation of nonlinear diffusion. Error indicators guide a successive 
coarsening process. This leads to locally coarse grids in areas of resulting 
smooth image intensity, while enhanced edges are still resolved on fine 
grid levels. Special emphasis has been put on algorithmical aspects such 
as storage requirements and efficiency. Furthermore, a new nonlinear 
anisotropic diffusion method for vector field visualization is presented. 



1 Introduction 

Nonlinear diffusion methods in image processing have been known for a long 
time. In 1987 Perona and Malik [17] introduced a continuous diffusion model 
which allows the denoising of images together with the enhancing of edges. The 
diffusion driven evolution is started on an initial image intensity. In general, it 
is either noisy because of unavoidable measurement errors, or it carries partially 
hidden patterns which have to be intensified and outlined [9,23]. Such an image 
smoothing and feature restoration process can be understood as a successive 
coarsening while certain structures are retained on a fine scale - an approach 
which is closely related to the major techniques in image compression. 

Finite Element methods are widespread to discretize and appropriately imple- 
ment the diffusion based models. Their general convergence properties were stud- 
ied for instance by Kacur and Mikula [13]. Furthermore, Schndrr applied Finite 
Elements in a variational approach to image processing [19]. In various areas of 
scientific computing adaptive Finite Element methods [6,4] have been incorpo- 
rated to substantially reduce the required degrees of freedom while conserving 
the approximation quality of the numerical solution. Thereby locally defined re- 
liable error estimators or some error indicators steer the local grid refinement, 
respectively coarsening [22,5]. The image intensities resulting from the nonlinear 
parabolic evolution are obviously well-suited to be resolved on adaptive grids. 
As time evolves, a successive coarsening in areas of smooth image intensity is 
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near at hand. For instance in case of an c?-dimensional image, where the image 
intensity is constant on piecewise smoothly bounded regions, we obtain the same 
image quality on a log{N)) complex adaptive grid as on a 0{N‘^) regu- 

lar grid. The cost of the numerical algorithm, the storage requirements, and the 
transmission time on computer networks scale with this complexity in terms of 
actual degrees of freedom. 

These efficiency perspectives have first been studied by Bansch and Mikula [3], 
who presented an adaptive Finite Element method. This method is based on 
simplicial grids generated by bisection and then again successively coarsened in 
the diffusion process. The major shortcoming of their approach is the enormous 
memory requirement for the data structures describing the adaptive grid and 
the sparse matrices used in the linear solver in each implicit time. Therefore, 
large 3D images - as they are widespread in medical images - are difficult to 
manage on moderately sized workstations. 

Here we present an adaptive multilevel Finite Element method which avoids 
these shortcomings and comes along with minimal storage requirements. The 
specific ingredients of our method are: 

— adaptive quad- and octrees, with accompanying piecewise bilinear, respec- 
tively trilinear Finite Element spaces are procedurally handled only, 

— error indicators on grid nodes and a suitable threshold value implicitly de- 
scribe the adaptive grid (no explicit adaptive grid structure is required), 

— invoking a certain saturation condition for the nodal indicators, we ensure 
robustness and one level transitions only on the resulting adaptive grid, 

— the adaptive Finite Element space is defined as an implicitly constrained 
discrete space on the full grid, 

— the grid is completely handled procedurally, 

— and instead of dealing with explicitly stored sparse matrices, the hierarchi- 
cally preconditioned linear solver in each timestep uses ”on-the-fly” matrix 
multiplication based on efficient grid traversals. 

Let us mention that this approach benefits from general and efficient multilevel 
data post processing methodology [16,18] and is related to the multilevel methods 
discussed in [1,24]. 

Finally, as a - to our knowledge - new area of application we will present a scale 
space method in vector field visualization. Flow visualization is an important 
task in scientific visualization. Simply drawing vector plots at nodes of some 
overlayed regular grid in general produces visual clutter. The central goal is to 
come up with inituitive methods with more comprehensible results. They should 
provide an overall as well as detailed view on the flow patterns. Several tech- 
niques generating such textures based on discrete models have been presented 
[8,15,20,21]. We ask for a continuous model which leads to stretched streamline 
type patterns, which are aligned to the vector field. Furthermore, the possibil- 
ity to successively coarsen this pattern is obviously a desirable property. For 
the generation of such field aligned flow patterns we apply anisotropic nonlin- 
ear diffusion. A matrix valued diffusion coefficient controls the anisotropy as in 
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Weickart’s method [25] to restore and enhance lower dimensional structures in 
images. 

2 FE-Discretization of Nonlinear Diffusion 

Let us look at the modified Perona-Malik [17] model proposed by Catte, Lions, 
Morel, and Coll [9]. Without any restriction we consider the domain Q := [0, 1]'^, 
d = 2, 3 and ask for solution of the following nonlinear parabolic, boundary and 
initial value problem: Find p : M’*' x C — >■ R*” such that 

§lp-div{A{Vp^)Vp) = f{p), inR+xl7, 

P(0> •) = Po , on 17 , 

-§p p = Q , on R+ X dfl. 

where in the basic model A = g for a non negative monotone decreasing function 
g : Rq — >■ R+ satisfying lims_>oo5(s) = 0, e. g. g{s) = (1 + and p^ is a 

mollification of p with some smoothing kernel. We interpret the solution p for 
increasing t G R“*" to be a successively filtered version of po- With respect to 
the shape of g, the diffusion is of regularized backward type [14] in regions of 
high image gradients, while noisy regions of po will be smoothed by dominant 
diffusion. 

We solve this problem numerically by applying a bilinear, respectively trilinear 
conforming Finite Element discretization on an adaptive quadrilateral, respec- 
tively hexahedral grid. In time a semi-implicit second order Euler scheme is 
used. As it has become standard the scheme is semi-implicit with respect to the 
evaluation of the nonlinear diffusion coefficient g and the right hand side. The 
computation of the mollified intensity pe is based on a single short timestep of 
the corresponding heat equation (linear diffusion) with given data p [13]. In the 
ith timestep we have to solve the linear system (M -|- TL{p^))p^ = MfP~^ + F, 
where p* is the corresponding solution vector consisting of the nodal values, r 
the current timestep, M is the lumped mass matrix, L{p^) the weighted stiffness 
matrix and F the vector representation of the right hand side. The growth of F 
in the application is moderate compared to chemical reaction diffusion equations. 
Therefore we have not recognized any instabilities with this source term. The 
stiffness matrix and the right hand side are computed by applying the midpoint 
quadrature rule. 

The above linear system as well as the linear system resulting from the mollifica- 
tion by the heat equation kernel is solved by a preconditioned conjugate gradient 
method. We use the Bramble~Pasciak~Xu preconditioning [7], thus making ap- 
propriate use of the given grid hierarchy. 

As already mentioned above, a peculiarity of our scheme is that no matrices 
are stored explicitly. Instead, the multiplication of the mass, respectively the 
stiffness matrix with a coefficient vector consisting of nodal values is done pro- 
cedurally. Therefore, in each step the hierarchical and adaptive grid is traversed 
and element wise local contributions are evaluated and successively assembled 
on the resulting coefficient vector. Thus we avoid storing the matrices explicitly. 
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Fig. 1. On the left element types in two and three dimensions and their refinements 
are shown and on the right a grid configuration with hanging nodes is depicted. 



Otherwise we would have been unable to manage typical 3D applications with 
more than 10 million nodes. Furthermore, this procedural access carries strong 
provisions for code optimization with respect to a cache optimal numbering of 
the nodes. 

3 Grid Adaptivity and Error Indicators 

In this section we will discuss an adaptive approach to the problem of nonlin- 
ear diffusion. We will especially focus on the choice and the handling of error 
indicator values on the grid nodes which steer the adaptive algorithm. It will 
be outlined that saturation plays an essential role in the robustness and imple- 
mentability of the proposed algorithm. In fact, solely referring to saturated error 
indicator information and not to some explicit grid hierarchy enables us to define 
and handle appropriate adaptive meshes for the nonlinear diffusion algorithm. 
Let us assume the dimension of our image to be -I- 1) in each direction for 

some ?max G N. The degrees of freedom are interpretated as nodal values of a 
regular grid with elements for c? = 2, 3. Above this fine grid level we define 

a quadtree, respectively octree hierarchy of elements with Imax + 1 grid levels. 
In each local refinement step an element E is subdivided into a set C{E) of 2”^ 
child elements (cf. Fig. 1). Vice versa we denote by V{E) the ancestor of an el- 
ement E. In each refinement step new grid nodes x appear. They are expressed 
by weighted sums over their parent nodes x-p G Ei^x) from the set of coarser 
grid level nodes: x = J2x-pev(x)‘^(^’Xp)xp. The weights to{x,xp) G 
depend on the type of the new node, which might be the center of a ID edge, a 
2D face, or a 3D hexahedron. Let us denote by Afc{E) the set of new nodes on 
an element E. 

We suppose the grid to be adaptive. I. e. depending on data the recursive refine- 
ment is stopped locally on elements of different grid levels. Thereby a sequence 
of nested successively refined grids is generated. On this sequence 

we define discrete function spaces {V*}o<z</„,ax consisting of continuous piece- 
wise bilinear, respectively trilinear functions, which are ordered by set inclusion: 
C C • • • C V* C C • • • C . Let denote the basis of 

consisting of hat-functions, i.e. if {x\, . . . ,xn} denotes the set of non constrained 
vertices of we have (j)\{x^) = 5ij, j = 1, ... ,7V. Thereby a vertex is called 
constrained, or a hanging node, if it is not generated by refinement on every 
adjacent element (cf. Fig. 1). On adaptive quadtrees, respectively octrees such 
hanging nodes are unavoidable. The handling of the corresponding nodal values 
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is crucial for the efficiency of the resulting adaptive numerical algorithm. We 
choose an efficient implicit processing which will be described below. 

Usually, for timedependent problems a grid modification consisting of the re- 
finement and coarsening of elements is necessary at certain time steps. In our 
setting we start on the initial fine grid and it suffices to coarsen elements, 

since there is in general no spatial movement of the image edges and complete 
information of the image is coded on the initial grid. This coarsening is obtained 
by prescribing a data dependent, boolean valued stopping criterion S{E) on el- 
ements, which implies local stopping in a recursive depth first traversal of the 
hierarchical grid. It turned out to be suitable to let this element stopping crite- 
rion depend on a corresponding criterion S{x) on the nodes, respectively basis 
functions, i. e. we define S{E):— S{-) distinguishes which de- 

grees of freedom are actually important, respectively which nodal values can be 
generated by interpolation of some coarse grid function. If rj{x) is some error 
indicator on the nodes x and e is a prescribed threshold value, we obtain such an 
interpolation criterion by S{x):={r]{x) < e) . Given an image intensity p G 
an intuitive choice for an error indicator is rj{x):=\S/ p{x)\, because the gradient 
of an image p acts like an edge indicator. Hence in regions with nearly constant 
intensity the grid will be coarsened substantially, whereas in the vicinity of high 
gradients, indicating preservable edges, the grid size is kept fine. 

The stopping criterion on elements is motivated by the fact that in the next 
refinement step only interpolated nodal values would appear. To ensure every 
descendent nodal value on such an element - also those on finer grid levels - 
to be interpolated we require the following natural saturation condition on the 
error indicator 

(Saturation Condition) An error indicator value r]{x) for x G Af{E) 

is always greater than every error indicator rj{xc) for xc G Mc{E). 

In general the saturation condition is not fulfilled, but we can modify the er- 
ror indicator in a preprocessing step. Typically, this turns out to be necessary 
only on coarse grid levels. A simple update algorithm for an error indicator 77 
and thereby the corresponding projection criterion S is the following bottom-up 
traversal of the grid hierarchy, starting on the second finest level and ending on 
the macro grid. 

for 1= Zmax“l to 0 step -1 do 
for each element E of At* do 

for all xGAf{E) do if (77(0:) < g*) g{x) = g* •, 

Let us emphasize that a depth first traversal of the hierarchy in the adjustment 
procedure would not be sufficient. This saturation process “transports” fine grid 
error information up to coarse grid level and prevents us from overlooking im- 
portant fine grid details [16]. Furthermore, the saturation condition comes along 
with another desirable property. The corresponding element stopping criterion 
implies only one level grid transitions at element faces of the actual adaptive 
grid (cf. Fig. 1). Thus, the possible hanging node configurations confine to the 
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Fig. 2. From left to right several timesteps of the selective image smoothing on adaptive 
grids are shown. 



basic one level cases. I. e. any open face and any edge of an element E contains 
at most one hanging node (cf. [11] for a general treatment of hanging nodes). 
Finally, this has straightforward implications on the assurance of continuity of 
discrete Finite Element functions and the corresponding matrix assembly in the 
implementation of our nonlinear diffusion algorithm. In general on regular grids 
the continuity is guaranteed by identifying each local degree of freedom (dof) 
with the global dof in the assembly of the global stiffness matrices and the right 
hand side of the corresponding discrete linear problem. However, hanging nodes 
of the adaptive grid do not represent dofs, due to their dependence upon other 
dofs. Therefore, when assembling the global stiffness matrices, we have to dis- 
tribute the contribution of the hanging nodes onto the constraining dofs. This 
is nothing else but procedurally respecting the appropriate interpolation condi- 
tions. For future use let us introduce the following notation: 

— NDEP(i) = Number of constraints of the node with local index i of an element. 
We define NDEP(i):=l if the node is not constrained. 

— CC0EF(i,j) = List of constrained coefficients. In our case we always have 
CCDEF(j,j) = l/NDEP(z) for j = 1, . . . ,NDEP(«). 

— CDDFM(i, j) = List of global dofs that constrain the node i, j = 1, . . . , NDEP(i). 
For non-hanging nodes CD0FM(i, 1) coincides with the global dof of node i. 

The CCOEF-values are identical to the weights in the above node generation rule. 
Figure 2 shows the application of the resulting adaptive algorithm to selectively 
smoothen some noisy image. In Figure 3 and 4 we have applied the algorithm 
to a 3D data set [12]. Figure 5 shows results obtained by the application of non- 
linear diffusion to image segmentation. The approach is based on a continuous 
multilevel analogue of the watershed algorithm. 
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Fig. 3. Nonlinear diffusion has been applied to a 3d medical data set. Several slices 
through the adaptive grid are depicted showing the corresponding image intensity as 
well as the intersection lines with element faces. 



4 Procedural Grid Handling and Matrix Multiplication 

As already mentioned in Section 2 the hierarchical grid is handled solely pro- 
cedurally and necessary matrix multiplications in the linear system solver are 
performed on-the-fly traversing the adaptive grid recursively. Let us describe 
this now in more detail. 

Traversing the grid, information that is needed to identify an element E will be 
generated recursively. If this recursive traversal routine reaches a leaf element of 
the adaptive grid, i. e. an element for which S{E) is true, a callback-method will 
perform some action on that element. For instance it calculates the local right 
hand side. An element E is identified by the index vector of its lower left corner, 
its grid level and its refinement-type. Every other information like the element’s 
size, the mapping of local dofs to global dofs, and the constrained dofs will be 
stored in lookup tables as already mentioned in Section 3. In 2D the hierarchical 
traversal can be formulated in pseudo code as follows: 

sub traversed, j, lev, refType, callback, params) 
if (lev yf Zmax) and -iS'(element) do 
offset = 

traversed, j, lev+1, 0, callback, params); 
traversed + offset, j, lev+1, 1, callback, params); 
traversed + offset, j + offset, lev+1, 2, callback, params); 
traversed, j + offset, lev+1, callback, params); 
else callbackd, j, lev, refType, params); 

We can also formulate the “on-the-fly” matrix-vector multiplication using this 
callback traversal. Multiplying a given vector u with the matrix M + tL(p^) 
and assembling the result in a vector w requires the following local callback 
procedure: 
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Fig. 4. A transparent isosurface visualization of the brain data set, smoothed by non- 
linear diffusion (cf. Fig. 3). 




Fig. 5. Brain segmentation on slices of a MRT-image by nonlinear diffusion. Consecu- 
tive timesteps of the corresponding evolution are depicted. 



sub matrixProduct (i , j, lev, refType, (u,w)) 
for each pair l,k of local dofs 

for lc=0 to NDEP(l), kc=0 to NDEP(k) 

w(CD0FM(kc)) += localMatrix(CDOFM(lc) , CDOFM(kc)) * 

CCOEFF(lc) * CCOEFF(kc) * u(CD0FM(lc) ) ; 



Similarily the adaptive BPX preconditioning can be implemented. 

5 Application to Flow Visualization 

As already sketched in the introduction we will now apply nonlinear anisotropic 
diffusion to vector field visualization. Thereby we consider diffusive smoothing 
along streamlines and edge enhancing in the orthogonal directions. Applying this 
to some initial random noise image we generate a scale of successively coarser 
patterns which represent the flow field. 

For a given smooth vector field u : 17 — >■ R" we define a family of con- 
tinuous orthogonal mappings B{v) : R" — >■ SO{n) such that B{v)v = eo , 
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Fig. 6. A single timestep is depicted from the nonlinear diffusion method applied to 
the vector field describing the flow around an obstacle at a fixed time. A discrete white 
noise is considered as initial data. We run the evolution on the left for a small and on 
the right for a large constant diffusion coefficient a. 




Fig. 7. Several timesteps are depicted from the nonlinear anisotropic evolution applied 
to a convective flow field in a 2D box. 



where {ei}i=o,"-.n-i is the standard base in K". We consider a diffusion ma- 
trix A = A(v,Vpe) and define 

where a : R+ — >■ R+ controls the linear diffusion in vector field direction, i. e. 
along streamlines, and the above introduced edge enhancing diffusion coefficient 
g(-) acts in the orthogonal directions. We may either choose a linear function a or 
in case of a velocity field, which spatially varies over several orders of magnitude, 
we select a monotone function a with a(0) > 0 and lims_,.oo o(s) = Omax • 
Different to the problems studied by Weickart in [25] in our case no canonical 
initial data is given. To avoid aliasing artifacts we thus choose some random 
noise po of an appropriate frequency range. This can for instance be generated 
running a linear isotropic diffusion simulation on a discrete white noise for a 
short time. During the evolution the random pattern will grow upstream and 
downstream, whereas the edges tangential to these patterns are successively 
enhanced. Still there is some diffusion perpendicular to the field which supplies 
us for evolving time with a scale of progressively coarser representation of the flow 
field. Running the evolution for vanishing right hand side / the image contrast 
will unfortunately decrease successively. Thus the asymptotic limit would turn 
out to be an averaged grey value. Therefore, we select an appropriate contrast 
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Fig. 8. Convective patterns in a 2D flow field are displayed and emphasized by the 
method of anisotropic nonlinear diffusion. The images show the velocity field of the flow 
at different timesteps. Thereby the resulting alignment is with respect to streamlines 
of this timedependent flow. 



enhancing right hand side / : [0,1] — >■ R+ with /(O) = /(I) = 0 , / > 0 on 
(0.5, 1) , and / < 0 on (0,0.5) (cf. reaction diffusion problems in image analysis 
studied in [2,10]). Finally we end up with the method of nonlinear anisotropic 
diffusion to visualize complex vector fields. 

We expect an almost everywhere convergence to p{oo,-) € {0,1} due to the 
choice of the contrast enhancing function /(•). The set of asymptotic limits 
significantly influences the richness of the developing pattern. One way to enrich 
this set significantly is to consider a vector valued p : 17 — ?> [O;!]^ for some 
m > 1 and a corresponding system of parabolic equations. Now, the nonlinear 
diffusion coefficient g{-) is assumed to depend on the norm || Vp|| of the Jacobian 
of the vector valued density Vp and as right hand we define f{p) = h{\\p\\)p. 
Here h{s) = f{s)/s for s 0, where / is the old right hand side from the 
scalar case, and h{0) = 0. Finally the random initial density is assumed to 
have values in Hi(0) fl [0,1]^. Obviously the contrast enhancement leads to 
asymptotic values which are either 0 or lie on the sphere sector 0 [0, 1]^ 
in K^. This method is capable to nicely depict the global structure of flow fields, 
including saddle points, vortices, and stagnation points on the boundary. This is 
indicated by Figure 7 and 8. Here the anisotropic diffusion method is applied to 
an incompressible Benard convection problem in a rectangular box with heating 
from below and cooling from above. The formation of convection rolls leads to 
an exchange of temperature. 

The anisotropic nonlinear diffusion problem has already been formulated in Sec- 
tion 2 for arbitrary space dimension. Differing from 2D in 3D we have somehow 
to break up the volume and open up the view to inner regions. Here a further 
benefit of the vector valued diffusion comes into play. The asymptotic limits - 
which differ from 0 - are in mean equally distributed on fl [0, 1]^ . Hence, 
we reduce the informational content and focus on a ball shaped neighbourhood 
Bs{uj) of a certain point w £ fl [0, 1]^ (cf. Fig. 9). 
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Fig. 9. The incompressible flow in a water basin with two interior walls and an inlet 
(on the left) and an outlet (on the right) is visualized by anisotropic nonlinear diffusion. 
Isosurfaces show the preimage of dBs{uj) (for different values of 5) under the vector 
valued mapping p for some point a; on Color is indicating the velocity. 



6 Conclusions 

We have discussed an adaptive Finite Element method for the discretization of 
nonlinear diffusion methods in large scale image processing. Especially, we have 
introduced a new method to process adaptive grids and corresponding mass- 
and stiffness matrices procedurally with out storing any matrix or any graph 
structure for the hierarchical tree of elements. Thus the method enables the 
handling of large images (257^ dofs and more) on moderately sized workstations. 
Furthermore a new method for 2D and 3D flow visualization has been presented. 
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Abstract. This paper presents an interpretation of a classic optical flow 
method by Nagel and Enkelmann as a tensor-driven anisotropic diffusion 
approach in digital image analysis. We introduce an improvement into 
the model formulation, and we establish well-posedness results for the 
resulting system of parabolic partial differential equations. Our method 
avoids linearizations in the optical flow constraint, and it can recover 
displacement fields which are far beyond the typical one-pixel limits that 
are characteristic for many differential methods for optical flow recovery. 
A robust numerical scheme is presented in detail. We avoid convergence 
to irrelevant local minima by embedding our method into a linear scale- 
space framework and using a focusing strategy from coarse to fine scales. 
The high accuracy of the proposed method is demonstrated by means of 
a synthetic and a real-world image sequence. 



1 Introduction 

Optical flow computation consists of finding the apparent motion of objects in 
a sequence of images. It is a key problem in artificial vision and much research 
has been devoted to this held; for a survey see e.g. [23]. 

In the present paper we shall consider two images I\ {x, y) and l 2 {x, y) (defined 
on K.^ to simplify the discussion) which represent two consecutive views in a 
sequence of images. Under the assumption that corresponding pixels have equal 
grey values, the determination of the optical flow from I\ to I 2 comes down to 
finding a function h{x,y) = {u{x,y),v{x,y)) such that 

h{x,y) = l 2 {x + u{x,y),y + v{x,y)), V(x,7/)gR^. (1) 

To compute h{x, y) the preceding equality is usually linearized yielding the 
so-called optical flow constraint 

Ii{x) — hix) K, {Vl 2 {x),h{x)) VT G 

M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 235—246, 1999. 
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where x := (x, y). The linearized optical flow constraint assumes that the object 
displacements h{x) are small or that the image is slowly varying in space. In 
other cases, this linearization is no longer valid. 

Frequently, instead of equation (1), researchers use the alternative equality 

Ii{x-u{x,y),y-v{x,y)) = l 2 {x,y), \/ {x , y) G . (2) 

In this case the displacement h{x,y) is centered in the image l 2 {x,y). 

The determination of optical flow is a classic ill-posed problem in computer 
vision [7], and it requires to be supplemented with additional regularizing assump- 
tions. The regularization by Horn and Schunck [16] assumes that the optical 
flow field is smooth. However, since many natural image sequences are better 
described in terms of piecewise smooth flow fields separated by discontinuities, 
much research has been done to modify the Horn and Schunck approach in order 
to permit such discontinuous flow flelds; see e.g. [8,10,11,21,24,25,27,32] and the 
references therein. 

An important improvement in this direction has been achieved by Nagel and 
Enkelmann [24] in 1986. They consider the following minimization problem: 



EneQt) = / (/i(x - u{x, y),y- v(x, y)) - hix, y)f dx (3) 

+ C j^Jrace ({Vhf D {VI i) (Vh)^ dx 



where C is a positive constant and D (V/i) is a regularized projection matrix in 
the direction perpendicular of VI\'. 



D{Vh) 



1 

|V/i|2 + 2A2 



dh 

dy 

.Mx 

dx 




In this formulation. Id denotes the identity matrix. The advantage of this method 
is that it inhibits blurring of the flow across boundaries of /i where |V/i| >> A. 
This model, however, uses an optical flow constraint which is centered in I 2 , 
while the projection matrix D in the smoothness term depends on Ii. This 
inconsistency may lead to erroneous results for large displacement flelds. In order 
to avoid this problem, we consider a modified energy functional where both the 
optical flow constraint and the smoothness constraint are related to Ii: 



E{h) = / {h{x,y) - hix + u{x,y),y + v{x,y))f dx (4) 

+ C J^jr ace ({Vhf D{Vh){Vh)'j dx. 
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The associated Euler-Lagrange equations are given by the PDE system 

dh 

C div {D (V/i) Vu) -I- {Ii{x) — hix + h{x))) + h{x)) = 0, (5) 

Cdivp (V/i) Vv) + (Ji(x) - h{x + h{x))) ^{x + h{x)) = 0. (6) 

In this paper, we are interested in solutions of the equations (5)-(6) in the case 
of large displacement fields and images that are not necessarily slowly varying 
in space. Therefore, we do not introduce any linearization in the above system. 
We obtain the solutions by calculating the asymptotic state {t — >• oo) of the 
parabolic system 



^ = C'div {D (V/i) Vm) + (/i(x) - hix + h{x))) + H^)), (7) 

= C'div {D (V/i) Vu) + - l 2 {x + h{x))) ^{x + h{x)). (8) 

ot ay 

Interestingly, this coupled system of diffusion-reaction equations reveals a 
diffusion tensor which resembles the one used for edge-enhancing anisotropic 
diffusion filtering. Indeed, D{VIi) has the eigenvectors vi := V/i and V 2 ■= 
V/j^. The corresponding eigenvalues are given by 



Ai(|V/i|) 

A2(|V/i|) 



A2 

|V/i|2 + 2A2’ 
|V/i|2 + A2 
|V/i|2 + 2A2- 



(9) 

( 10 ) 



We observe, that Ai -I- A 2 = 1 holds independently of V/i. In the interior 
of objects we have |V/i| — >■ 0, and therefore Ai — >■ 1/2 and A 2 — >■ 1/2. At 
ideal edges where |V/i| — >■ 00 , we obtain Ai — >■ 0 and A 2 — >■ 1. Thus, we 
have isotropic behaviour within regions, and at image boundaries the process 
smoothes anisotropically along the edge. This behaviour is very similar to edge- 
enhancing anisotropic diffusion filtering [30], and it is also close in spirit to the 
modified mean-curvature motion considered in [3] . In this sense, one may regard 
the Nagel-Enkelmann method as an early predecessor of modern PDE techniques 
for image restoration. For a detailed treatment of anisotropic diffusion filtering 
we refer to [31], and an axiomatic classification of mean-curvature motion and 
related morphological PDFs for image analysis is presented in [2] . 

Without any linearization, the optical flow constraint may cause a nonconvex 
energy functional (4) . In this case we cannot expect the uniqueness of solutions 
of the elliptic system (5)-(6), and the asymptotic state of the above parabolic 
system depends on the initial data for the flow u and v. In order to encourage 
convergence to the physically correct solution in case of large displacement 
flow, we will design a linear scale-space focusing procedure for the optical flow 
constraint. Using a scale-space approach enables us also to perform a finer and 
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more reliable scale focusing as it would be the case for related pyramid [4] or 
multigrid approaches [12]. 

The paper is organized as follows: In Section 2 we sketch existence and 
uniqueness results for the nonlinear parabolic system (7)-(8). In Section 3 we 
apply a linear scale-space focusing which enables us to achieve convergence to 
realistic solutions for large displacement vectors. Section 4 describes a numerical 
discretization of the parabolic system (7)- (8) based on an explicit finite differ- 
ence scheme. In Section 5 we present experimental results on a synthetic and a 
real-world image sequence. Finally, in Section 6 we conclude with a summary. 

Related work. Proesmans et al. [25] studied a related approach that also 
dispenses with a linearization of the optical flow constraint in order to allow 
for larger displacements. Their method, however, requires six coupled partial 
differential equations and its nonlinear diffusion process uses a scalar-valued 
diffusivity instead of a diffusion tensor. Their discontinuity-preserving smoothing 
is flow-driven while ours is image-driven. Another PDF technique that is similar 
in vein to the work of Proesmans et al. is a stereo method due to Shah [28] . With 
respect to embeddings into a linear scale-space framework our method can be 
related to the optical flow approach of Florack et al. [14]. Their method differs 
from ours in that it is purely linear, applies scale selection mechanisms and 
does not use discontinuity-preserving nonlinear smoothness terms. Our focusing 
strategy for avoiding to end up in irrelevant local minima also resembles the 
graduated non-convexity (GNC) algorithms of Blake and Zisserman [9]. 

2 Existence and Uniqneness of the Parabolic System 

Next we investigate the parabolic system of nonlinear partial differential equa- 
tions (7)-(8). In [1], the authors develop a theoretical framework to study the 
existence and uniqueness of solutions of a similar parabolic system, but with a 
different regularization term. The main techniques used in [1] can be applied 
in order to obtain the existence and uniqueness of the solutions of the system 

(7) -(8). This leads to the following result. 

Theorem 1. Let I 2 G and Ii G C'^(R^). Then the parabolic system (7)- 

(8) has a unique generalized solution h{.,t) G C ([0, 00 ); x for 

all initial flows ho G x 

3 A Linear Scale-Space Approach to Recover Large 
Displacements 

In general, the Euler-Lagrange equations (5)-(6) will have multiple solutions. As 
a consequence, the asymptotic state of the parabolic system (7)-(8), which we use 
for approximating the optical flow, will depend on the initial data. Typically, the 
convergence is the better, the closer the initial data is to the asymptotic state. 
When we expect small displacements in the scene, the natural choice is to take 
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u = V = 0 as initialization of the flow. For large displacement fields, however, 
this may not work, and we need better initial data. To this end, we embed our 
method into a linear scale-space framework [17,33]. Considering the problem at 
a coarse scale avoids that the algorithm gets trapped in physically irrelevant 
local minima. The coarse-scale solution serves then as initial data for solving the 
problem at a finer scale. Scale focusing has a long tradition in linear scale-space 
theory (see e.g. Bergholm [6] for an early approach), and in spite of the fact 
that several theoretical problems exist, it has not lost its popularity due to its 
favourable practical behaviour. Detailed descriptions of linear scale-space theory 
can be found in [13,15,18,19,22,29]. 

We proceed as follows. First, we introduce a linear scale factor in the parabolic 
PDF system in order to end up with 

C dW {D(yGa * h)Vua) + 

* h{x) -Ga* + ha{x))j + ha(x)), ( 11 ) 

G dW {D(yGa * h)Vv^) + 

(^Ga * h{x) -Ga* hix + K{x))^ 

where G^ * I represents the convolution of I with a Gaussian of standard devi- 
ation cr. 

The convolution with a Gaussian blends the information in the images and 
allows us to recover a connection between the objects in I\ and l 2 - We start with 
a large initial scale (Tq. Then we compute the optical flow (ua-o^Va-o) at scale ctq 
as the asymptotic state of the solution of the above PDF system using as initial 
data u = V = 0. Next, we choose a number of scales cr„ < ct„_i < .... < ag, and 
for each scale at we compute the optical flow (uai,Vai) as the asymptotic state 
of the above PDF system with initial data The final computed 

flow corresponds to the smallest scale (j„. In accordance with the logarithmic 
sampling strategy in linear scale-space theory [20], we choose ai := ifuQ with 
some decay rate 77 G (0, 1). 



dUfj 



dVa 

dt 



4 Numerical Scheme 



We discretize the parabolic system (11)-(12) by finite differences. All spatial 
derivatives are approximated by central differences, and for the discretization 
in t direction we use an explicit (Fuler forward) scheme. Gaussian convolution 
was performed in the spatial domain with renormalized Gaussians, which where 

truncated at 5 times their standard deviation. Let DiVC^ * h) = c)’ Then 




240 



L. Alvarez, J. Weickert, and J. Sanchez 



our explicit scheme has the structure 



<r-<. ^ 



I I 



hi 



hi 












^z+ljj + l “1“ ^id ^i+l,i + l ^'i‘d — Ij — 1 ^'^d 



2/2-2 ^2 



2/^2 ^2 



^z+l,j — 1 H“ ^id 1 ^'i'd — lii + 1 l)i + l 



2/^2 ^2 



2/^2 ^2 



+ ^-^l,cr(a;jj) l2,a{xij + l2,x,<r{xij + h^.i j), (13) 

; 2 + 2 






Ci,j + 1 H“ Cij' ^ij Cij — i H“ — 1 






/i| 



^z+l,j + l “t” ‘^i+l,j+l ‘^i,j . ^i—l,j—l ~h bij ‘^i—ij — i ^i,j 

^ 2 2/ii/i2 2 2/ii/i2 

^z+l,j — 1 “t” ^i+l,j — 1 l,j + l “t” bij '^ij \ 

2 ‘2hih2 2 ‘2h\h2 j 

+ ^^l,CT(a;jj) — l2,a{xij + h^.ij^ l2,y,a{xij + h^^j). (14) 



The notations are almost selfexplaining: for instance, r is the time step size, h\ 
and ft -2 denote the pixel size in x and y direction, respectively, uf approximates 

Ua in some grid point Xij at time kr, and Ii^x,<r is an approximation to G„ * 

^ 

We calculate values of type l 2 ,<r{xi,j + h(r,i,j) by linear interpolation, and we use 
the time step size 



r = 



0.5 

4(H “t“ max(|d2,a:,(T(^z,j) I ? |4^1,y,(r(^z,j) I ) 



(15) 



This step size can be motivated from a stability analysis in the maximum norm 
applied to a simplification of (13)-(14) where a scalar-valued diffusivity and a 
linearized optical flow constraint is used. 



5 Experimental Results 

The complete algorithm for computing the optical flow depends on a number of 
parameters which have an intuitive meaning: 
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— The regularization parameter C specifies the balance between the smoothing 
term and the optical flow constraint. Larger values lead to smoother flow 
fields by filling in information from image edges where flow measurements 
with higher reliability are available. Recent results show that there is also a 
close relationship between the parameter (7 of a regularization method and 
the scale parameter of a diffusion scale-space [26] . 

— The constant A in the smoothing term serves as a contrast parameter: loca- 
tions where the image gradient magnitude is larger than A are regarded as 
edges. The diffusion process smoothes anisotropically along these edges. In 
our experiments we used A := 1. The results were not very sensitive to 
underestimations of A. 

— The scale CTq denotes the standard deviation of the largest Gaussian. In 
general, (Jq is chosen according to the maximum displacement expected. In 
our case we used (Jq := 10. 

— The decay rate r] € (0, 1) for the computation of the scales am ■= 7™cro. We 
may expect a good focusing if 77 is close to 1. We have chosen 77 := 0.95. 

— The smallest scale is given by It should be close to the inner scale of the 
image in order to achieve optimal flow localization. 

— The stopping time T for solving the system (11)-(12) at each scale am- 
When good initializations from coarser scales are available, we observed that 
T := 20 gives results which are sufficiently close to the asymptotic state. 

Figure 1 shows our first experiment. We use a synthetic image composed 
of four black squares on a white background. Each square moves in a different 
direction and with a different displacement magnitude: under the assumption 
that the x axis is oriented from left to right and the y axis from top to bottom, 
the left square on the top moves with (u,v) = (10,5), the right square on the 
top is displaced with (u,v) = (—10,0), the left square on the bottom is shifted 
by (w, v) = (0, —5), and the right square on the bottom undergoes a translation 
by (-10,-10). In order to visualize the flow field {u,v) we use two grey level 
images (ugi,Vgi) defined by Ugi := 128 -I- 8u and Vgi := 128 -I- 87;. We use the 
regularization parameter C = 15000. The depicted optical flow was obtained 
without scale-space focusing, i.e. with a^ = 0. As can be expected, the algorithm 
gets trapped in a physically irrelevant local minimum. 

Figure 2 shows that the proposed scale-space focusing leads to significantly 
improved results. We start with initial scale ctq = 10 and show the results for 
focusing to the scales aio = 5.99, CT20 = 3.58, CT30 = 2.15, CT40 = 1.29, (J50 = 0.77, 
(Tgo = 0.46, and a-jQ = 0.28, respectively. The other parameters are identical with 
those in Figure 1. We notice that the computed flow is a good approximation of 
the expected flow. In fact, not only the orientation of the flow is correct, but also 
the flow magnitude is surprisingly accurate: the maximum of the computed optic 
flow magnitude is 14.13, which is a very good approximation of the ground truth 
maximum 10-\/2 « 14.14. It results from the square which moves in (—10, —10) 
direction. This indicates that - under specific circumstances - our method may 
even lead to optical flow results with subpixel accuracy. 

This observation is confirmed in the quantitative evaluations carried out in 
Figure 3. The left plot shows the average angular errors in the four squares of the 
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Fig. 1. Optic flow obtained withont scale-space 
focnsing (T = 800) . 



first frame. The angels between the correct flow (uc,Vc) and the estimated flow 
(ue,Ve) have been calculated in the same way as in [5]. The right plot depicts 
the Euclidean error a / (ug — Uc)^ + (fe ~ averaged over all pixels within the 
four squares of the first frame. In both cases we observe that the error is reduced 
dramatically by focusing down in scale-space until it reaches a very small value 
when the Gaussian width cr approaches the inner scale of the image. Further 
reductions of a leads to slightly higher errors. It appears that this is caused by 
discretization effects. 

In the fourth experiment, we use the classical taxi sequence, but instead of 
taking two consecutive frames - as is usually done - we consider the frames 15 
and 19. The dark car at the left creates a largest displacement magnitude of 
approximately 12 pixels. In Figure 4 we present the computed flow using the 
regularization parameter C = 500 and focusing from (Tq = 10 to cryo = 0.28. 
The computed maximal flow magnitude is 11.68, which is a good approximation 
of the actual displacement of the dark car. Figure 5 shows a vector plot of the 
computed flow field. 



6 Conclusions 

Usually, when computer vision researchers deal with variational methods for 
optical flow calculations, they linearize the optical flow constraint. Except for 
those cases where the images a sufficiently slowly varying in space, linearization, 
however, does only work for small displacements. In this paper we investigate 
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Fig. 2. From top to bottom and from left to right: the original pair 
of images Ji and I 2 , and the flow components {un,Vn) resulting from 
focusing to the scales uio = 5.99, (J 20 = 3.58, (J 30 = 2.15, (T 40 = 1.29, 
(J 50 = 0.77, ago ~ 0.46, and (T 70 = 0.28, respectively. 



Angular Error 




Gaussian Sigma value 



Global Error 




Gaussian Sigma value 



Fig. 3. Left: Average angular error of the optic flow calculations for the squares in 
the first frame of Figure 2. Right: Corresponding average Euclidean error. 






Fig. 5. Vector plot of the optic flow between the frames 15 and 19 of 
the taxi seqnence. 
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an improved formulation of a classical method by Nagel and Enkelmann where 
no linearization is used. We identify this method with two coupled anisotropic 
diffusion filters with a nonlinear reaction term. We showed that this parabolic 
system is well-posed from a mathematical viewpoint, and we presented a finite 
difference scheme for its numerical solution. In order to avoid that the algorithms 
converges to physically irrelevant local minima, we embedded it into a linear 
scale-space approach for focusing the solution from a coarse to a fine scale. 
The numerical results that we have presented for a synthetic and a real-world 
sequence are very encouraging: it was possible to recover displacements of more 
than 10 pixels with high accuracy. It is our hope that this successful blend 
of nonlinear anisotropic PDEs and linear scale-space techniques may serve as 
a motivation to study other combinations of linear and nonlinear scale-space 
approaches in the future. 

Acknowledgements. This work has been supported by the European TMR 
network Viscosity Solutions and their Applications. 
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Abstract. This paper introduces a new method for analyzing scaling 
phenomena in natural images, and draws some consequences as to whether 
natural images belong to the space of functions with bounded variation. 



1 Introduction 

A digital, gray level image may be seen as the realization of a random vector of 
size H X L taking values in a discrete set V = 1, For typical values like 
H = L = G = 256, the number of possible realizations, G^^ = is huge. 

Obviously, “natural images”, i.e. digital photographs of natural scenes, only form 
a small subset of all possible realizations. Looking at random realizations of 
such vectors is enough to be convinced of this fact. Natural images are highly 
improbable events. It is therefore interesting to look for statistical characteristics 
of such images: what are the relationships between gray level values at distant 
pixels? Is it possible to define a probability law for natural images? Moreover, 
statistics of texture images may be useful for synthesis purposes (see [9], [24], 
[23]). 

Most of the statistical studies of natural images are concerned with first 
or second order statistics (through the power spectrum, the covariances, the 
cooccurrences) or with additive decompositions of images. The power spectrum 
P{uj, v) is known to be well approximated by a power function (;y 2 ^y 2 jV) where 7 
is an image dependant number usually close to 2 (see [5], [7]). The histogram of 
natural images has been found to have a peculiar, non-Gaussian shape (see [20], 
[10]). Nearest neighbors coocurences functions also exhibit non-Gaussian distri- 
butions (see [10]). Principal and independant component analysis on databases 
of such images yield localized and oriented images bases (see [17], [2]). We have 
a different approach, working in the image domain on items that can have a 
straightforward visual interpretation, and involve (relatively) long and high or- 
der interactions between pixels. We shall show that in natural images, there is 
a constant form for the size distribution of such items. The definitions of sizes 
we consider are of two types: area and boundary length. An experimental pro- 
gram which we performed on many photographs of very diverse natural scenes 
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indicates that the size distribution of homogeneous parts in images obeys a law 

K 

CardlHomogeneous regions with size s} = — , 

s“ 

where K is an image dependent constant. When the size s denotes the area, in 
most photographs, a is close to 2. We will define in Section 2 what we mean by 
homogeneous parts, the connected components of image domains where contrast 
does not exceed a certain threshold. Let us mention that power laws have been 
previously observed, e.g. for points statistics (see [18], [19]) or density of extrema 
in scale space (see [11]). 

As a consequence of the size power law, some information can be obtained 
about the “natural” function space for images, as will be shown in Section 3: we 
focus our attention on the space BV of functions with bounded variation. We 
are in a position to tell when a given image is not in this space, provided the 
observed size distribution model remains true at smaller (not observable) scales 
as well. 



2 Sizes of sections in natnral images 

2.1 The distribution of areas 

We’ll now make clearer what we mean by homogeneous region of an image. 
We begin by equalizing the image histogram, and uniformly quantify it in the 
following way. We consider a digital image I of size H x L, with G integer gray 
levels, and write for the gray level at pixel (i, j). Let k be an integer less 

than G. Let Ni be the first integer such that more than ^ pixels have a gray 
level less than Ni, then N 2 the first integer such that more than 2^ have a 
gray level less than N 2 , then N 3 ,...,Nk = G defined the same way, this sequence 
being possibly constant at some point. For I varying from 1 to k, let /; be the 
binary image with h{i,j) = 1 if I{i,j) £ [Ni-i,N[) and Ii{i,j) = 0 otherwise. 
We call those images fc-bilevels of I. Each bilevel image represents a quantization 
level of the equalized image. 

Next, we look at the area histogram of the connected components of the 
bilevels. For s an integer varying from 0 to HL, let /(s) be the number of 
connected components with area s of the set of I’s pixels, in any of the k- 
bilevels of I. We will both consider 4-connectivity (each pixel has 4 neighbors: 
up, down, right, left) and 8-connectivity (we add the diagonal neighbors, so that 
each pixel has 8 neighbors). 

We computed the function / on many digital photographs. We did not at- 
tempt to use a single source of images; the digitized images either are scanned 
photographs or from a digital camera, with diverse optical systems and expo- 
sures. Those functions are of the form f{s) = with C a constant and a a real 
number close to two, for values of s in a certain range and reasonable values of k 
(basically between 4 and 30). The observed fit is excellent, as can be seen from 
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Figure 2, which actually corresponds to one of the worst cases we observed. For 
fixed fc, we consider the set of points 

S = {(log(s),log(/(s)),0 < S < Tmax}, 

where T^ax + ^ is the smallest value of s such that /(s) = 0. We perform a linear 
regression on this set S' so as to find the straight line (in the log-log coordinates) 
( 7 (log(z)) = A — alog(z) the closest to S in the least squares sense, and write E 
for the least squares error. 



2.2 The distribution of areas in digital photographs 

We present the results for two pictures having different scales and textures in 
Table 1. The value of a appears to be related to the amount of texture in the 
image; the more textured the image, the bigger the value of a. Typically, for 
photographs of natural scenes, the value of alpha is between 1.5 and 3 (the 
values close to 3 being reached for images as the baboon (Figure 1), which 
present textured areas), whereas for textures (e.g. from the Brodatz’s album), it 
is typically between 2.5 and 3.5. 




Fig. 1. baboon (512 x 512) and city (612 x 792) images 
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Table 1. different values of the quantization number k for the city and baboon images, 
8-connectivity. Area distribution is f{s) = As~°‘ , Tmax is the maximal considered area 
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Fig. 2. function / (area distribution) for the city image (Figure 1), k = 12, 8- 
connectivity, Tmax = 202 



We also performed the linear regression on sets of points 



^Tmin — {(^t)g(s) , log(/*(s)) , ^ ^ 

for various values of Tmin to show that the fit of S to the power law was not 
forced by small areas only, and moreover that if the contribution of E mainly 
comes from the large areas, the value of a computed with those large areas was 
close to the initial value. The results for the image of the city are shown in Table 
2. Those results about the stability of the slope of the regression across scales 
are of great importance in view of the hypothesis to be made in Section 3. 
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Table 2. different values of Tmin for the city image, k = 16, 8-connectivity 



2.3 The distribution of boundary lengths in digital photographs 

We performed exactly the same analysis on the boundary lengths of connected 
components of bilevels as we did before on areas of those components. As a 
discrete definition of the length of a discrete connected set S (8-connectivity), 
we chose to count the pixels not belonging to S that are neighbors of some pixel 
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of S in the 4-connectivity sense. There are many other ways to define discrete 

boundary length. We tried several other methods that gave basically the same 

results as the one we detail here. The notations k, E, A, Tmin and T^ax refer 

to the same quantities as before; (3 now stands for the exponent of the power 

law whose fit to the boundary length distribution is the best in the least square 

sense. We chose Tmin = 10, because some small values for the boundary lengths 

are attained only for regions touching the border of the image. The fit to the 

power law is again very good, and the exponent f3 is usually between 2 and 3. 

We present the results for the images of the city and the baboon in Table 3 . We 

note that /3 ~ 2a — 1 accounts for connected components of bilevel sets satisfying 

1 

on the average a decent isoperimetric ratio, c < length — 

the case in general, except for some images of textures. 
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Table 3. boundary lengths for the city and baboon images, with different values of 
the quantization number k 



Let us mention that the length distribution of intersections of the homoge- 
neous part of the image with lines (the so-called intercepts) also follows a power 
law. In a forecoming paper (see [1]), we use a morphological model, the dead- 
leaves model of G. Matheron (see [14]), as an object-based model for images. 
An image is defined as a sequential superposition of random objects. If we in- 
terpret the homogeneous parts as being the visible parts of objects after the 
occlusion process, it is possible to deduce the form of the length distribution of 
the intercepts from a power law distribution of the size of objects. This result 
is closely related to the ones of [19], where objects are defined in the image by 
visual segmentation, and where a power law is observed for the covariances. 



2.4 Other types of images 

In order to see whether the power law is in some sense characterizing digital 
photographs, we computed histograms of areas of bilevels for other types of 
images. We looked at noises images, white or correlated and text images. 

White noise images, that is to say images in which the gray level values 
at distinct pixels are independent random variables, present an histogram of 
the form /(s) = exp(— Cs), with C a constant. We observed this fact on two 
different kinds of white noises: uniform and Gaussian. Text images produced by 
text editor do lead, as one would guess, to an histogram consisting of isolated 
peaks, whose height is not directly related to the value of s. 
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Then we looked at correlated noise. We performed convolutions between 
white noises and a Gaussian exp{— +y'^)) , where a is a variable parameter. 

This was done by multiplication in the frequency domain. Such a convolution 
can be seen as a crude approximation of the effect of an optical lens. The results 
we obtain for those images were similar to the ones for digital photographs of 
textures. We present the results obtained in the case of the uniform white noise 
in Table 4. We also tested the effect of the convolution with Bessel functions 
(Fourier transform of disks) and the results were very similar. 
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Table 4. Uniform white noise image, after convolution with a Gaussian of parameter 
a, k=12, 8-connectivity 



Those ” non-natural images” lead to two remarks about the — ^ law. First, 
this law does not characterize natural images, even though a correlated noise 
looks similar to a natural texture. Secondly, the size law could be related to the 
way the optical photographic device captures the image, as suggested by the 
behavior of noises convolved with a Gaussian. More precisely, we observed that 
the convolution with a Gaussian increases the value of a for images where the 
initial a is small (such as text and synthetic images) whereas it tends to decrease 
its value when it is initially bigger than 2 (noises). 

Another, and more satisfactory explanation of this power law is scale in- 
variance. The assumption that natural images are scale invariant, so that all 
observed statistics should be scale (zoom) invariant, has been confirmed by the 
shape of the power spectrum mentioned in the introduction (see [7]), and also 
by the fact that some statistics are preserved when shrinking the image (see 
[20], [16]). Our experiments also confirm this assumption, since scale invariance 
yields the law. Indeed, if we suppose that the total area occupied by re- 
gions having an area between A and A' is the same as the total area occupied 
by regions with area between tA and tA' , for all t, A, A', then the power law 
with exponent 2 is the only acceptable size distribution. 

3 Size of sections and the BV norm of natnral images 

The aim of this section is to give a computational tool to decide whether an 
image can belong to the space BV of functions with bounded variations. The 
BV assumption for natural images is far ranging, from image restoration ([21], 
[22]) to image compression. 

The space BV is the space of functions for which the sum of the perimeters of 
the level sets is finite. The space BV is of great importance in image modeling. 
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since such a simple image as a white disk on a black background is not in any 
Sobolev space, but belongs to BV. However, if the disk is replaced by an object 
whose boundary has an infinite length, such as a bidimensional Cantor set, then 
the corresponding function is no longer in BV . There is also another way for a 
function not to be in BV . Each of its level sets may be of finite perimeter, while 
the sum of those perimeters tends towards infinity. According to our analysis, 
this is the case with natural images, for which, in a sense, small objects are too 
numerous for the function to be in BV . 

3.1 A lower bound for the BV norm 

We consider I G BV{f2) a bounded image belonging to the space of functions 
with bounded variation ([25], [6]) on a domain (e.g. rectangular) 17 C For 
X G M, define the level set of I with level A by 

X\I = {x,I{x) > A}. 

Recall that a function is of bounded variation if, for almost every X G M, 
X\I is a set with finite perimeter and, denoting by per(x\I) this perimeter (for 
a precise definition of the perimeter and the essential boundary we refer to [6]), 



\\I\\bv = f per{x\I)dX. (1) 

JR 

(By the coarea formula, see [6], we also have ||/||s\/ = \DI\) 

In addition, by the classical isoperimetric inequality, we have for every set O 
with finite perimeter, 

per(O) > 27 t5j/(0)5, (2) 

where v{0) denotes the Lebesgue measure of O. In the following, we shall 
consider sections of the image. We always assume that the image I satisfies 

0 < I{x) < C. We first fix two parameters 7, A, with 0 < A < 7. For any n G IN, 
we consider the bilevel sets of I 

{X, A + (n - 1)7 < I{x) < X + nj} = XA+(n-l)7-f \ Xx+njI- 

We call (7, A)-section of / any set which is a connected component of a bilevel 
set xx+{n-i)jl \ Xx+n-yl for some n. We denote each one of them by for 

1 G J(7, A), a set of indices. Notice that the (7, A)-sections are disjoint and their 
union is the image domain 17, 



U (3) 

iGJ(j,X) 



There are several ways to define the connected components of a set with finite 
perimeter, since such a set is defined up to a set with zero Lebesgue measure. We 
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denote by the one-dimensional Hausdorff measure, that is to say the length. 
In the following, we call Jordan curve a simple closed curve of IR^ , i.e. the range 
of a continuous map c : [0, 1] — >■ IR?', such that c(s) yf c{t) for all 0 < s < t < 1, 
and c(0) = c(l). A Jordan curve defines two and only two connected components 
(in the usual sense) of IR? \ c([0, 1]), one bounded and one unbounded. We shall 
say that a Jordan curve separates two points x and y if they do not belong to 
the same connected component of \ c([0, 1]). One can prove ([8], [3]) that a 
definition of connected components for a set with finite perimeter permits the 
following statements : 

Theorem 1 (and definition) 

Let O be a set with finite perimeter. 

(i) The essential boundary of O consists, up to a set of zero H^-measure, of a 
countable set of noncrossing simple rectifiable closed curves Cj with finite length 
such that per{0) = H^{cj) 

(ii) Two points are in the same connected component of O if and only if for 
any representation of the essential boundary by a family of Jordan curves of the 
preceding kind, Cj, they are not separated by one of the Cj. 

(Hi) With this definition, the perimeter of a set with finite perimeter is the sum 
of the perimeters of its connected components. 

We denote by J(n) C J(y, A) the set of indices of sections which are connected 
components of x\+{n-i)-yl \ X^+n-yl- As an obvious consequence of Proposition 
1, we have 

Corollary 1 



perixx+in-ihIXxx+njI) = ^ per{Sx,j,i). 

When A is a set with finite perimeter, we have ([6]) 

per(A) = pAllBy- 



Lemma 1 If B C A are two sets with finite perimeter, then 
per{A \B) < per{A) + per{B). 

Proof Indeed, by the subadditivity of the BV norm, we deduce from 



= 1a — Is 



that 



per(A \ i?) < per(A) -I- per(B). □ 

In the following theorem, we analyze the statistics of sizes of sections as 
follows. We fix 7 , that is, the overall contrast of considered sections and for each 
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0 < A < 7, we count all sections which have an area between s and s + ds. 
In other terms we consider the integer 

Card{i, s < < s + ds}. 

We average this number over all A’s in [0,7], and assume that this average 
number has a density /(7, s) with respect to s. In other terms, 

Card{i, s < |S'^,A,i| < s + ds}dX = f{-f, s)ds ( 4 ) 




Theorem 2 Assume that there exists some 7 > 0 such that ( 4 ) holds, i.e. the 
average number of sections with area s, for 0 < A < 7, has a density /(7, s). 
Then there is a constant c, not depending on I, such that 

/•viO) ^ 

\\I\\bv>c s^f{j,s)ds. ( 5 ) 

^0 

Proof Applying Corollary 1 and Lemma 1 



\bv = / per{x, /(x) > A}dA 

J]R 

— Tyi [ per{x, /(x) > A}dA + f per{x, /(x) > A — yjdA 

2 Jm Jm 

> 1 / per(xA-7^\XA-f)rfA 
^ JlR 






1 + 1)7 



per(xA-7/\XA/)dA 



nGZ 

1 n 

2 



P 

/ L] per(xA+(n-i)7-f \ Xa+«7^)c^A 

nez 

rl 

/ ^ i)dX. 

Jo T/. XA 



iG J(7,A) 



By isoperimetric inequality ( 2 ), we therefore obtain 



\BV 



Jo 7 .. XA 



ie J(7 ,a) 



Applying Fubini-Tonelli Theorem, some slicing and the assumption ( 4 ), we 



get 



j rl j 

>772/ dX Card{t G J(7, A), s < |S'-y_A,i| < s + ds}s^ 

Jo Jo 

2 / / dACard{i G J(7, A), s < |S'7x,i| ^ ■s + 

Jo Jo 



\BV ^ 
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We can repeat the preceding analysis by assuming now that 

Card{i,p < per(5'^_A,i) < P + dp}dX = g{'y,p)dp. (6) 

Then we have the analog of Theorem 2 for the perimeters of sections: 

Theorem 3 Assume that there exists some 7 > 0 such that (6) holds, i.e. the 
average number of sections with perimeter s, for 0 < A < 7 , has a density 5 ( 7 , _p). 
Then 




\\I\\Bv>]^j^ pg{l,p)dp. (7) 

Proof The proof is essentially the same as for Theorem 2. 

3.2 Application to natural images 

In this section, we draw the consequences of Theorems 2 and 3 for the images 
analyzed in Section 2. According to the results of this section, we can assume 
that the considered images satisfy 

/(7,s) = ^ ( 8 ) 

9il,P) = % (9) 

pP 

for some constants a > 0, f) > 0. This law has been experimentally checked for 
several values of 7 = k ranging from 8 to 20. We also checked that the value 
of a was almost not modified when the bilevels were not defined from gray level 
0 , but from some gray level less that ^ (that is to say, in the continuous model, 
for different values of A). By Theorem 2 we have 

||d||ijy>c/ = +00 if a > - 

Jo ^ 2 



and in the same way. 



\\I\\bv > c 




-a dp 



+00 if /3 > 2. 



thus if we admit that ( 8 ) and (9) indeed hold for natural images when s — >■ 0, 
as is indicated by the experiments of section 3.1, we obtain that the considered 
images are not in BV if a > |, or /3 > 2. This strong assumption about the 
small scales behavior is motivated by the goodness of the fit at every scales and 
by the stability of the fit with respect to Tmin, see Section 2, Table 2. Notice, 
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however, that a > 2, which happens for several of the considered images, is not 
compatible with a finite image area, since then f ^ = +oo. As suggested to 
us by Vicent Caselles and Stephane Mallat, this raises the question of whether 
the area is correctly measured by covering pixels. In fact, if a region is very 
ragged, the cardinality of covering pixels may be related to its perimeter as well, 
in which case the estimate of g(j, s) is more reliable. This cardinality could also 
be related to a fractional Hausdorff measure. 

We point out here that wavelet coefficients (see [15], [13] for an introduction 
to wavelet decompositions) also give a way to decide whether or not an image 
belongs to the space BV. Let (ck) be the wavelets coefficients of the image 
I, ordered in a nonincreasing sequence. Let us suppose that the wavelets have 
compact supports. We say that the Ck’s are in if X) |cfc| < +oo, and that they 
are in weak-^^ if there exists a constant C such that Cfe < ^. Obviously is 
included in weak-l^. It is quite easy to prove that if the are in /^, then / is 
in BV. In the other direction, Cohen and ah, [4], recently proved that if / is 
in BV, then the Cfc’s are in weak-?^. Thus it is possible to decide whether an 
image belongs or not to BV by looking at its wavelet coefficients decay, except 
if they decrease like which happens to be often the case ([12]). Moreover, 
it is worth noticing that the wavelet coefficients produced by the characteristic 
function of a simple shape already decay like ^ . We do not present here a precise 
comparison between the two criteria. Let us just mention that in the case of the 
baboon image (Figure 1), both methods agree: this image is not in BV. For 
the well-known image of Lena, our approach gives an a of 1.9 (for k = 16), 
which suggests Lena being out of BV, whereas from the wavelet approach, the 
image is in BV . In fact, according to our analysis, natural images are not in the 
space BV . Of course, one may objects the presence of an inner scale cut off, but 
our results indicate that the BV norm of continuous representations of natural 
images blows up as we consider smaller and smaller scales. 



4 Conclusions 

We realized experimentally that the size distribution of homogeneous parts in 
digital natural images follows a power law. This power law confirms the scale 
invariance of natural images. Moreover, this enables us to show that, provided 
this power law is valid for small, non-observable scales, most natural images are 
not in the space BV oi functions with bounded variations. 
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Abstract. Edges are viewed as statistical outliers with respect to local 
image gradient magnitudes. Within local image regions we compnte a 
robust statistical measure of the gradient variation and use this in an 
anisotropic diffusion framework to determine a spatially varying “edge- 
stopping” parameter a. We show how to determine this parameter for 
two edge-stopping functions described in the literature (Perona-Malik 
and the Tukey biweight). Smoothing of the image is related the local 
texture and in regions of low texture, small gradient values may be 
treated as edges whereas in regions of high texture, large gradient magni- 
tudes are necessary before an edge is preserved. Intuitively these results 
have similarities with human perceptual phenomena such as masking and 
“popout” . Results are shown on a variety of standard images. 



1 Introduction 

Anisotropic diffusion has been widely used for “edge-preserving” smoothing of 
images. Little attention, however, has been paid to defining exactly what is 
meant by an “edge.” In the traditional formulation of Perona and Malik [8], 
edges are related to pixels with large gradient magnitudes and an anisotropic 
smoothing function is one that inhibits smoothing across such boundaries. The 
effect of this smoothing is determined by some parameter, a, which implicitly 
defines what is meant by an edge. This paper addresses how the a parameter 
can be determined automatically from the image data in such a way that edges 
correspond to statistical outliers with respect to local image gradients. With this 
method, cr varies across the image and hence, what is considered to be an edge 
is dependent on local statistical properties of the image. 

Consider for example the image in Figure 1. Regions A and B illustrate areas 
where there is little gradient variation and the fairly small gradient magnitudes 
of the features are locally significant. Intuitively, we would say that the eyebrow 
and the shoulder crease are significant image structures. In contrast, region C is 
highly textured and there is a great deal of variation in the gradient magnitudes. 
Intuitively, in this region, the gradient magnitudes of features like those in regions 
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Fig. 1. Consider the image regions (A, B, C) in the upper left image. The middle row 
shows each image region in detail while the bottom row shows the gradient magnitude 
for each region. The faint image structures in regions A and B are statistically sig- 
nificant with respect the variation of intensity within the regions. The same variation 
in the highly textured region C would not be statistically significant due to the much 
larger image variation. 



A and B might be considered insignificant. To be considered an edge in region 
C we would like the gradient magnitude to be much larger. 

Here we adopt the robust statistical interpretation of anisotropic diffusion 
elaborated in [1]. Anisotropic diffusion is viewed as a robust statistical proce- 
dure that estimates a piecewise smooth image from noisy input data. This work 
formalized the relationship between the “edge-stopping” function in the aniso- 
tropic diffusion equation and the error norm and influence function in a robust 
estimation framework. This robust statistical interpretation provides a principled 
means for defining and detecting the boundaries (edges) between the piecewise 
smooth regions in an image that has been smoothed with anisotropic diffusion. 
Edges are considered statistical outliers in this framework. 

The robust statistical approach also provides a framework to locally define 
edges and stopping functions, as demonstrated in this paper (see [11] for a dif- 
ferent approach to spatially adaptive anisotropic diffusion). In particular, the 
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a parameter in the edge stopping function has also a statistical interpretation. 
This statistical interpretation gives, among other properties, a completely auto- 
matic diffusion algorithm, since all the parameters are computed from the image. 
Our approach is to compute a statistically robust local measure of the brightness 
variation within image regions. From this we obtain a local definition of edges 
and a space- variant edge stopping function. 

2 Review 

We briefly review the traditional anisotropic diffusion formulation as presented 
by Perona and Malik [8]. 

2.1 Anisotropic diffusion: Perona- Malik formulation 

Diffusion algorithms smooth images via a partial differential equation (PDF) . For 
example, consider applying the isotropic diffusion equation (the heat equation) 
given by = div(V/), using the original (degraded/noisy) image I{x, y, 0) 

as the initial condition, where / {x, y, 0) : — >■ JR'^ is an image in the continuous 

domain, (x, y) specifies spatial position, t is an artificial time parameter, and 
where V/ is the image gradient. Modifying the image according to this isotropic 
diffusion equation is equivalent to filtering the image with a Gaussian filter. 
Perona and Malik [8] replace the classical isotropic diffusion equation with 

^^^|M=div(g(|| V/||,a)V/), (1) 

where || VI || is the gradient magnitude, and 5 (|| VI ||) is an “edge-stopping” 
function and <t is a scale parameter. This function is chosen to satisfy g{x, ct) — >■ 0 
when X — >■ oo so that the diffusion is “stopped” across edges. 

2.2 Perona-Malik discrete formulation 

Perona and Malik discretized their anisotropic diffusion equation as follows: 

= ^‘ + A E <j)VIs,p, (2) 

where is a discretely-sampled image, s denotes the pixel position in a discrete, 
two-dimensional grid, and t now denotes discrete time steps (iterations). The 
constant A G IR^ is a scalar that determines the rate of diffusion, rjs represents 
the spatial neighborhood of pixel s, and |t 7 s| is the number of neighbors (usually 
4, except at the image boundaries). Perona and Malik linearly approximated the 
image gradient in a particular direction as 

VIg,p = Ip ~ Is^ P ^ Vs- ( 3 ) 

Qualitatively, the effect of anisotropic diffusion is to smooth the original 
image while preserving brightness discontinuities. The choice of g(x,a) and the 
value of a can greatly affect the extent to which discontinuities are preserved. 
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2.3 Related Work 

In related work, a number of authors have explored the estimation of the scale 
at which to estimate edges in images [2,5]. These methods find the optimal local 
scale for detecting edges with Gaussian filters; they do not explicitly use local 
image statistics. The approach described here might be augmented using these 
ideas to determine the size of the local area within which to compute image 
statistics. 

Marimont and Rubner [6] computed local statistics of zero-crossings and used 
these to define the probability of a pixel belonging to an edge. Liang and Wang 
[4] also used the statistics of zero-crossings to set a local noise measure in an 
anisotropic diffusion formulation. 

In contrast, the work here provides a robust statistical view which allows a 
principled choice of both the g-function and the scale parameter. Related to this 
is work on human perception that models feature saliency using a statistical test 
for outliers [9]. 

3 Robust Statistical View 

For the majority of pixels in Figure 1 A, the image gradient values can be ap- 
proximately modeled as being constant (zero) with random Gaussian noise. The 
large gradient values due to the image feature however are statistical “outliers” 
[3] with respect to the Gaussian distribution; the distribution of these outliers is 
unknown. We seek a function g{x, a) and a scale parameter cr that will appropri- 
ately smooth the image when the variation in the gradient is roughly Gaussian 
and will inhibit smoothing when the gradient can be viewed as an outlier. 

First we need to relate the form of the ^-functions used for anisotropic dif- 
fusion to the tools used in robust statistics (see [1] for details). From a robust 
statistical perspective the goal of anisotropic smoothing is to iteratively find an 
image / that satisfies the following optimization criterion: 

( 4 ) 

SGI P&Ve 

where p(-) is a robust error function and cr is a “scale” parameter. 

In this formulation large image differences \Ip — Is \ are assumed to be outliers 
which should not have a large effect on the solution. To analyze the behavior of 
a given p-function with respect to outliers, we consider its derivative (denoted 
Ip), which is proportional to its influence function [3]. This function characterizes 
the bias that a particular measurement has on the solution and by analyzing the 
shape of this function we can infer the behavior of a particular robust p-function 
with respect to outliers. 

In [I] (see also [12] for a related approach) it was shown that 



g{x,a)x = -)pix,a) = p'{x,a). 



( 5 ) 
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g{x,cr)x = i^{x,a) 




Fig. 2. Lorentzian error norm and the Perona-Malik g stopping function. 





g{x,cr)x = i>{x,a) 




Fig. 3. Tukey’s biweight. 



This relationship means that we can analyze the behavior of a particular aniso- 
tropic edge-stopping function g in terms of its outlier rejection properties by 
examining the influence function 

For example, consider the edge-stopping function proposed by Perona and 
Malik [8] 

2x 

g{x,(7)x=- ^=-tp{x,a), (6) 

where cr) = p'{x, a). We can compute p by integrating g{x, a)x with respect 
to X to derive 



J g{x, a)x dx 



(T^ log 





p{x,a). 



( 7 ) 



This function p(x, a) is proportional to the Lorentzian error norm use in robust 
statistics and g{x)x = p'{x) = ip{x) is proportional to the influence function 
(Figure 2). 

The function g(x, a) acts as a “weight” and from the plot in Figure 2 we 
can see that small values of x (i.e. small gradient magnitudes) will receive high 
weight. As we move out to the tails of this function it flattens out and the weight 
assigned to some large x will be roughly the same as the weight assigned to some 
nearby x+e. This behavior is visible in the shape of the ■0-function which reaches 
a peak and then begins to descend. Outlying values of x beyond a point receive 
roughly equivalent weights and hence there is little preference for one outlying 
value over another. In this sense outliers have little “influence” on the solution. 
In the anisotropic diffusion context, g(x, a)x will be relatively small for outliers 
and, hence, each iteration in (2) will produce only a small change in the image. 
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In [1] a more robust edge stopping function was derived from Tukey’s biweight 
p-error: 



p{x,a) 
g{x,a) 

The functions g{x,a), ’ip{x,a) and p{x,a) are plotted in Figure 3: The influ- 
ence of outliers drops off more rapidly than with the Lorentzian function and 
the influence goes to zero after a fixed value (a hard redescending function). 
These properties result in sharper boundaries than obtained with the Perona- 
Malik/Lorentzian function [1]. 

4 Local Measure of Edges 

Both functions defined in the previous section reduce the influence of large gra- 
dient magnitudes on the smoothed image. The point at which gradient values 
begin to be treated as outliers is dependent on the parameter a. In this section 
we consider how to globally and locally compute an estimate of cr directly from 
the image gradients. The main idea is that a should characterize the variance 
of the majority of the data within a region. So, for example in Figure 1 A, cr 
should characterize the amount of variation in the gradients at all locations ex- 
cept where the feature is located. Outliers will then be determined relative to 
this background variation. 

In deriving a we appeal to tools from robust statistics to automatically esti- 
mate the “robust scale,” <Je, of the image as [10] 

CTe = 1.4826 MAD(V/) 

= 1.4826 median/(|| V/ — median/(|| V/ ||) ||) (11) 

where “MAD” denotes the median absolute deviation and the constant is de- 
rived from the fact that the MAD of a zero-mean normal distribution with unit 
variance is 0.6745 = 1/1.4826. We consider ae to be the gradient magnitude at 
which outliers begin to be downweighted. 

We choose values for the scale parameters a to dilate each of the influence 
functions so that they begin rejecting outliers at the same value: ag- The point 
where the influence of outliers first begins to decrease occurs when the derivative 
of the ^-function is zero. For the Lorentzian p-function this occurs at a 

and for the Tukey function it occurs at Ug = a j^/b. Defining a with respect to cjg 
in this way we plot the influence functions for a range of values of x in Figure 4 a. 
Note how each function begins reducing the influence of measurements at the 
same point. 



|x| < a, 

3 otherwise, 

x(l — {x/aYY 1^1 ^ 

0 otherwise, 

i(l - {x/aff |a:| < ct, 

0 otherwise. 



(8) 

(9) 

( 10 ) 
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Fig. 4. Lorentzian and Tukey ■i/j-functions. (a) values of o chosen as a function of Ue 
so that outlier “rejection” begins at the same value for each function; (b) the functions 
aligned and scaled. 



We also scale the influence functions so that they return values in the same 
range. To do this we take A in (2) to be one over the value of ip{ae,<j)- The 
scaled t/>-functions are plotted in Figure 4&. 

Now we can directly compare the results of anisotropic smoothing with the 
different edge-stopping functions. The Tukey function gives zero weight to out- 
liers whose magnitude is above a certain value while the Lorentzian (or Perona- 
Malik) downweights outliers but still gives them some weight. 



4.1 Spatially Varying cr 

In previous work we took the region for computing CTc to be the entire image. This 
approach works well when edges are distributed homogeneously across the image 
but this is rarely the case. Here we explore the computation of this measure in 
image patches. In particular we consider computing a local scale ai{x,y), which 
is a function of spatial position, in n x n pixel patches at every location in the 
image. We take this value to be the larger of the ae estimated for the entire 
image and the value in the local patch. Then ai{x,y) is defined as 

ai{x,y) = max((Te, 1.4826 MAD_ " <ij< (V4+i,y+j)). (12) 

In practice ae provides a reasonable lower bound on the overall spatial image 
variation and the setting of ai to be the maximum of the global and local varia- 
tion prevents the amplification of noise in relatively homogeneous image regions. 

Figure 5 shows the results of estimating cr; in 15 x 15 pixel patches. Bright 
areas have higher values of cr; and correspond to more textured image regions. 

To see the effects of the spatially varying cr; consider the results in Figure 6. 
The images show the results of applying diffusion using the g{x, a) corresponding 
to the Tukey bi weight function. The top row uses a fixed value of ae estimated 
over the entire image while the bottom row shows the results with a spatially 
varying cr; . We can detect edges in the smoothed images very simply by detecting 
those points that are treated as outliers by the given p-function. Figure 6 shows 
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Fig. 5. Local estimate of scale, ai(x,y). Bright areas in (b) correspond to larger values 
of CT(. 



the outliers (edge points) in each of the images, where outliers are given by those 
points having \VIx,y \ > (Je (global, first row) or \VIx,y \ > ai{x,y) (local, second 
row). 

In areas containing little texture the results are identical since in these areas 
the sigma estimated locally is likely to be less than ae and hence, ai is set to 
(Te- The differences become apparent in the textured regions of the image. A 
detail is shown for a region of hair. With a fixed global CTe, discontinuities are 
detected densely in the hair region as the large gradients are considered outliers 
with respect to the rest of the image which has relatively few large gradients. 
With the spatially varying u;, these regions are smoothed more heavily and only 
the statistically significant discontinuities remain. 

5 Experimental Results 

In this section we test the spatially varying smoothing method with both the 
Lorentzian and Tukey ^-functions. Figure 7 compares the results for the Tukey 
function at 500 iterations and the Lorentzian at 50 iterations. The Lorentzian 
must be stopped sooner as, unlike the Tukey function, outliers have a finite influ- 
ence and hence the image will eventually become oversmoothed; for a discussion 
of the edge-stopping properties of the Tukey biweight function see [1]. In both 
cases note that the edges detected in the highly textured regions have a spatial 
density similar to that of other regions of image structure. 

Figure 8 shows a more textured image. Note that the highest scale values 
correspond to the steps in the lower middle portion of the image. The disconti- 
nuities here are smoothed while the boundaries of the people against a relatively 
uniform background are preserved. One can also see in this image the difference 
between the Lorentzian and Tukey functions in that the Tukey g-function results 
in sharper brightness discontinuities. 

The Magnetic Resonance image in Figure 9 is challenging because there are 
areas of high contrast as well as detailed brain structures of very low contrast. 
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Fig. 6. Anisotropic smoothing with the Tukey function (500 iterations). Top row shows 
smoothing with a fixed value of tJe. Bottom row shows a spatially varying ai. 



No single scale term will suffice for an image such as this. The results with the 
Tukey function preserve much of the fine detail and the detected edges reveal 
structure in both the high and low contrast regions. 



6 Conclusions 

One of the crucial steps in anisotropic diffusion is to define an edge, and from this 
definition, an edge stopping function. Several attempts have been reported in the 
literature, mainly dealing with global definitions. In this paper we have addressed 
the search for a local definition of edges. We have described a simple method 
for determining a spatially varying scale function based on robust statistical 
techniques. From this, we have provided a local definition of edges and a space- 
varying edge stopping function. 

A number of topics remain open. First, the only parameter left in the pro- 
posed anisotropic diffusion algorithm is the size of the window within which ai 
is computed. This also should be space-variant, and needs to be automatically 
determined from the image itself. This is an area of ongoing research. 

We are interested in comparing the output of our simple local edge detector 
with others as for example those proposed by Perona [7] or Elder and Zucker [2]. 
They use much more sophisticated techniques that might not be computationally 
efficient if the goal is to compute stopping functions for anisotropic diffusion. 
On the other hand, a more accurate computation of edges might be crucial for 
anisotropic diffusion applications such as the enhancement of medical images. 

Finally, it would be interesting to explore the relationship to human percep- 
tion of image features which can be effected by the local image statistics [9]. 
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50 iterations 



500 iterations 



Fig. 7 . Results for both the Perona-Malik (Lorentzian) function and the Tukey func- 
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CTi 50 iterations 500 iterations 



Fig. 9. Magnetic Resonance Image. Results for both the Perona-Malik (Lorentzian) 
function and the Tukey function. 



Such an exploration may lead to a new statistical model more closely aligned 
with human perception of edges. 
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Abstract. Fractal Brownian motions have been introduced as a statis- 
tical description of natural images. We analyze the Gaussian scale-space 
scaling of derivatives of fractal images. On the basis of this analysis we 
propose a method for estimation of the fractal dimension of images and 
scale-space normalisation used in conjunction with automatic scale se- 
lection assuming either constant energy over scale or self similar energy 
scaling. 

Keywords: Fractal dimension, natural images, self-similarity, Gaussian 
scale-space, image derivatives, scale-selection, feature detection. 



1 Introduction 

In the literature [1,2, 3, 4, 5, 6, 7, 8] one finds several investigations into the fractal 
nature of natural images and in this article we will look at some scale-space 
properties of images of natural scenes. Here we use the term natural image to 
denote any image of a real world scene, which may be assumed to have a fractal 
intensity surface (or volume). A fractal function is self-similar, which means 
that if one looks at the function as a random function then its distribution is 
independent of the scale. To characterize a fractal function one uses the fractal 
dimension. 

The fractal dimension of an image intuitively describes the roughness of the 
image intensity graph and the fractal dimension of the intensity surface of 2D 
images must a priori lie between 2 and 3. There has for some time been a general 
consent that 2D images of natural scenes have a fractal dimension (Hausdorff 
dimension^) Dh = 2.5, which is the same dimension as the classical 2D Brownian 
motion. But resent studies by Bialek et al. [6] has shown that 2D images of 
natural scenes^ not necessarily have to come from a Gaussian process and that 
the fractal dimension can vary in the interval between 2 and 3. 

Fractal Brownian motions (fBm) can be used as a model for images of natural 
scenes. By using this model we have the freedom to model images of any fractal 
dimension. The classical Brownian motion is a special case of the fBm. The fBms 
are in general continuous, but not differentiable. In the limit Dh 2, the 2D 

^ In this article we will use the Hausdorff dimension as the definition of the fractal 
dimension. See [9] for a mathematical definition of the Hausdorff dimension. 

^ The studies by Bialek et al. is based on a series of images in a forest. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 271—282, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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IBms generically become smooth (C°°). Whereas in the limit Dh — >■ 3, the 2D 
fBms become spatially uncorrelated. 

The estimation of the fractal dimension of regions of interest in images has 
different interesting prospects. It has been proposed [7,8] that the fractal di- 
mension of x-ray images of trabecular bone can give an indication of the micro 
structure of the bone and thereby also the biomechanical strength of the bone. 
This can be a helpful tool for the research of osteoporosis and other bone diseases. 
Other uses of the fractal dimension could be as a quality measure of surfaces 
produced in different kinds of industries, e.g. metal plates, wood etc. 

Linear scale-space [10,11,12] is a mathematical formalization of the concept of 
scale (or aperture) in physical measurements. By Gaussian convolution, images 
at higher inner scales than the measurement scale can be simulated, enabling us 
to create an artificial scale-space of an image. By using this type of scale-space we 
bypass the problem of differentiability of digital images, because differentiation 
of the image in scale-space may be obtained by differentiation of the Gauss 
function prior to the convolution. 

By the use of a non-linear combination of image derivatives, called measures 
of feature strength, it is possible to detect features in images [13]. In order to get 
dimensionless derivatives Florack et al. [14] has proposed normalisation of image 
derivatives where the derivatives are multiplied by the scale cr, {d / dx)norra = 
adjdx. Lindeberg [15,16,17] operates with scale-normalized derivatives in order 
to detect the most significant scale for the features. He uses a normalisation which 
is defined through a scaling exponent 7. In application to feature detection, 
this normalisation exponent 7 depend on the feature in question. Lindeberg 
determines this parameter on the basis of analysis of feature models. In this 
analysis the parameter varies in the interval [^; 1]. Our intuition^ is that this 
parameter must reflect the local complexity of the image, and may be modelled 
through the fractal dimension of the local image. In this paper we reveal a simple 
relation between the topological dimension of a feature and the fractal dimension 
of the local image for determining the scale-normalisation. 

We will in this paper assume, that the fBms constitute a model of images 
of natural scenes. Using this model we establish a method of scale-space nor- 
malisation of derivatives, changing the analytical expression of Lindeberg’s 7- 
normalisation. This expression includes the fractal dimension of the image in a 
neighbourhood of the feature we want to detect. We can furthermore use this 
normalisation method for estimation of the fractal dimension of images. 



2 Fractal Brownian Motions and Natnral Images 

The ID fBm was first defined by Mandelbrot and van Ness [18] in an integral 
form, which later on was restated in terms of self similarity of a distribution 
function. In this form it is straightforward to state the fBm defined over an iV-D 
space [5]. A function fnix) : 1— IR is called a iV-D fBm if for all positions 

® Developed during discussions with Lindeberg. 
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X G and all displacements A G 

P < ,) = Pd,). 

where F(y) is a cumulative distribution function and P the probability. The 
scaling exponent H G]0; 1[ determines the fractal dimension of the fBm. This 
definition implies that any ID straight line through the image is a ID fBm of 
same scaling exponent H . 

One can find experimental data to support the assertion that images of nat- 
ural scenes may satisfy the definition of the fBm [1,3, 4, 6], but F{y) is not in 
general a cumulated Gaussian distribution [6] as commonly presumed [1,3]. 

The power spectrum of the A^-D fBm fnix) is given by 

|/^M|2cx|u;|-“ (1) 

where ///(w) is the Fourier transform of the fBm and a = 2H + 1, [5,18,19] 
which is independent of the dimensionality N. Voss and Pentland [5,19] note the 
relation, Dh = N + {1 — H), between the Hausdorff dimension^ Dh of a iV-D 
fBm fnix) and the scaling exponent H . The estimation of a in (1) is, together 
with the relation between and H , a well known method for estimation of 
the fractal dimension of images, [8]. 

Lindeberg [15,20] argues that in the case of A^-dimensional natural images 
the assumption of a uniform energy distribution at all scales leads to a power 
spectrum proportional to With reference to Field [1], Lindeberg utilizes 

the assertion that the power spectrum has equal energy at all scale-invariant 
frequency intervals. We find that this only coincides with H = 1/2 for 2D im- 
ages, which is the case where the images can be modelled by classical Brownian 
motions. For Lindeberg’s proportionality to hold for other values of H the value 
of H must he H = which only makes sense for N < 3, because of the 
constraint 0 < iL < 1. So in general we cannot assume that |///(u;)|^ cx 
under the assumption that N-D natural images can be modelled as fBms. 

3 Scale-Space Scaling of Derivatives of Fractal Images 

In this section we will first give a short introduction to the Gaussian scale-space 
and its normalized derivatives. Then we will state our proposal for an extension 
of Lindeberg’s normalisation method based on the fractal dimension. 



3.1 Scale-Space and Normalisation 

Linear scale-space of images was independently introduced by lijima [10], Witkin 
[11] and Koenderink [12]. The linear Gaussian scale-space of an image L{x) : 

The Hausdorff dimension can intuitively be viewed as a scaling exponent of the space 
filling of the graph in question. 
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e- >■ ]R can be defined as a solution to the diffusion equation, which is given 

by 

L{x; t) = G{x; t) L{x) 

where t is the scaling parameter and the notation denotes convolution over 
X. G{x; t) : IR^ x IR e- >■ ]R is the Gauss function 



G{x; t) 



1 

{2TTa‘^)^G 





where t = a^. The nth order partial derivative of an image in scale-space can be 
found as 

t) L{x)) = ^ L{x) 

where Xi denotes the ith element of x. In this paper we will in general use tensor 
notation and Einstein’s summation convention when using image derivatives. 

Normalisation of image derivatives has been proposed by several authors 
[15,21,14]. The standard normalisation of the nth image derivative in scale-space, 
based on dimensional analysis [14], is 



-'ti-'-imnorm 



— f"/2r. 

— 1 / 5 



which for the 1st order of derivation is the same as {d/dx)norm = odjdx. Linde- 
berg proposes [15,16,17] another method of normalisation of image derivatives. 
He proposes that the nth order derivatives could be normalised as 



, 7 n —norm 



= G"L, 



where = ny/2 and 7 is a free normalisation parameter. In conjunction with 
feature detection Lindeberg has determined 7 by an analysis of model patterns 
reflecting the features under consideration. 



3.2 Scale-Space Normalisation Using the Fractal Dimension 

We propose that 7 „ can be stated as a relation of a (i.e. H), the topological 
dimension N of the image, and the order n of derivation. This is based on an 
assumption that images of natural scenes may be modelled as fBms and that 
normalised derivatives must have equal energy at all scales. 

We will investigate quadratic differential image invariants on the form 

= L ■ L 

We say that this kind of invariants are of the nth order of derivation. In the 
following we examine the Li-norm of such invariants. This corresponds to looking 
at the L 2 -norm of image derivatives. That is, we examine scaling of energy of 
image derivatives. Note furthermore that ||7^"^||i also equals the Li-norm of any 
other invariant quadratic in L of total order of derivation 2n (see [22]). 
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Theorem 1. If fnix) : !-->■ ]R is a N-D fBm and L{x; f) : x IR IR is 

the scale-space of fnix), then the nth order invariants I^'^\x]t) in this scale- 
space can be normalised to equal energy on all scales by the following relation 

ltL{x;t) = t-r’^I^-\x;t) 
where 7 „ = —a!2 -I- n -I- N/2. 

Proof. The proof is inspired by a similar analysis of the power spectrum of 
images of natural scenes by Lindeberg [15,20]. By usage of Parseval’s identity 
we find 












du> 



where ff = —1, and /a(t<^) and (w; i) respectively are the Fourier trans- 

formed image and the nth order differentiated Gaussian. Using 

U((m-hl)/2) 

2a(""+i)/2 ’ 

and introducing fV-D spherical coordinates, we find 

[ ■dpdipi---dipN-i = 

2f-7+"+f 

where K is an arbitrary constant, and hereby we arrive at 7 „ = -af2-\-n-\-Nf2. 

□ 



The normalisation relation of Theorem 1 gives us a special case when handling 
the 0th order of derivation, meaning the case of the undifferentiated scale-space. 
That is, we scale-normalise the scale-space by an exponent introduced by the 
fractal dimension of the original image. After doing so, the normalisation of the 
nth order derivation is just the normalisation based on dimensional analysis. 
This special case of the 0th order of derivation comes from the fact that the fBm 
is the fractional derivative or integral of the Brownian motion. 

A benefit of the proposed normalisation method is that the normalisation 
relation can be used as a method for estimation of the fractal dimension of im- 
ages. This can be done by calculating the Li-norm of a collection of differentiated 
scale-space images and then fit the logarithmic norm values to a straight line. 
We use this method in Sec. 4 to estimate the fractal dimension of synthetic and 
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Table 1. This table shows the measure of feature strength used by Lindeberg for 
feature detection with automatic scale selection using his 7-normalisation. We have 
calculated the corresponding values of H and Dh using our definition of 7. This table 
is a reproduction of a table from [ 17 ] with an extension of the columns for H, Dh and 
topological dimension T of the features. Note that a simple relation exists between the 
fractal dimension Dh and the topological dimension T. The relation between 7 and T 
is not as straight forward. 



Feature type 


Normalized strength measure 


7 


H 


Dh 


T 


Edge 




1/2 


1 


2 


T 


Ridge 


D'^iLpp- Lgg)'^ 


3/4 


1 


2 


T 


Corner 




1 


1/2 


2.5 


0" 


Blob 


DV^L 


1 


1/2 


2.5 


0 



real images. We will not conduct a comparative study of this method and other 
methods for estimation of the fractal dimension of images (see [8] for a study of 
other methods), but merely point out the existence of the method. 

In conjunction with feature detection, we must use the fractal dimension of 
the image in a neighbourhood of the feature of interest. This suggests a simple 
relation between the topological dimension of the feature and a suitable choice of 
fractal dimension. In Table 1 we have listed Lindeberg’s [17] suggested normal- 
ized measures of feature strength. For each feature we have calculated the H and 
Dh values that correspond to his suggested 7 values. It is interesting to notice 
that corners and blobs have a fractal dimension of 2.5 and edges and ridges only 
have a fractal dimension of 2. The topological dimension of corners and blobs is 
0, while edges and ridges have a topological dimension of 1. Round a corner or 
a blob, we would expect the void hypothesis oi H = \. This is not expected to 
be true in a neighbourhood of ID features owned to the spatial extend and we 
see that Lindeberg’s choice of 7 leads us to the hypothesis oi H = 1 for both ID 
features. 



4 Experiments 

We have conducted several experiments on synthetic and real 2D images in or- 
der to study the normalisation of digitized images. We can as stated earlier use 
the normalisation method to find the fractal dimension of images by calculat- 
ing unnormalized derivatives of the scale-space of the considered image. From 
this unnormalized scale-space we can estimate the value of 7 and calculate the 
Hausdorff dimension Dh- In the same manner one can get estimates of the local 
fractal dimension at a point in the original image. The fractal dimension of a 
point could be viewed as a contradiction in terms, but it is never the less possible 
to assign some meaning to this concept due to the intrinsic property of scale- 
space: A point in scale-space correspond to a neighbourhood in the underlying 
image. 
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Table 2. These tables show our estimated 7 values for synthetic images. The synthetic 
images used, and the corresponding graphs of the Li-norm of LiLi and LijLji, are 
depicted in Fig. 1. The top table show the values for a 2-D image differentiated n = 1 
times {LiLi) and the bottom table show the values for a 2-D image differentiated 
n = 2 times (LijLji). In each of the tables there are 4 synthetic images with different 
Q values. The 7 values are estimated by hrst constructing the synthetic image with 
the specihed a parameter and then calculating a series of 10 differentiated scale-space 
images of ascending scales. For each of these 10 images we have calculated the Li-norm, 
which can be seen in the graphs of Fig. 1. These graphs reveal an inaccuracy of the 
estimated Li-norms at high scales, which is why we chose to use only the Li-norms of 
the first 5 images of the scale-space (cr G [2.0; 6.7[) for our estimation of 7. The 7 value 
is estimated by calculating the logarithmic slope of the Li-norms of the scale-space 
images. The slope is the estimated 7 value. The reason why the estimated values of 
the synthetic images are not exact is because the images we used where small. That 
is, the span from the inner scale to the outer scale of the images are not sufficient to 
establish 7 as a single global average over all scales. Note that, a = 2 corresponds to 
a classical Brownian motion. 



a 


Estimated 7 values 


Actual 7 values 


Relative error 


1 


-1.57 


-1.5 


-4.46 % 


2 


-1.15 


-1 


-15.0 % 


2.5 


-0.96 


-0.75 


-28.0 % 


3 


-0.78 


-0.5 


-56.0 % 


a 


Estimated 7 values 


Actual 7 values 


Relative error 


1 


-2.55 


-2.5 


-2.0 % 


2 


-2.11 


-2 


-5.5 % 


2.5 


-1.89 


-1.75 


-8.0 % 


3 


-1.68 


-1.5 


-12.0 % 



It is the authors opinion that in principal all theory of fractal measures may 
be reformulated in the inherently well-posed framework of linear scale-space 
theory, thereby easing operationalisation of fractal measures. 

In Table 2 we show some results for small synthetic images (see the images 
in Fig. 1). The synthetic images used for these experiments were constructed 
in the frequency domain and were given a power spectrum proportional to 
and a random phase. We have calculated the Li-norm of different images from 
two scale spaces of LiLi and LijLji images. On this basis we have estimated the 
7 values for the synthetic images and compared them to the theoretical values 
from the continuous domain theory. 

The method of finding the fractal dimension that we propose is fairly accurate 
on synthetic images of known fractal dimensions. From Table 2 we see that our 
method delivers an inaccurate result for increasing a values. The reason for this 
inaccuracy is that when the a value is increased, the synthetic image will have 
structure on an increasing scale and when the a value of the image becomes 
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Fig. 1. In this figure we show the synthetic images with different a values and their 
corresponding graphs of the Li-norm of LiLi and LijLji which we used for estimation of 
the 7 values in Table 2. The synthetic images all have 256 x 256 pixels and a = 1, 2, 2.5, 3 
from the top down. The Li-norm graphs were produced by calculating a scale space 
for the two set of derivatives. This scale space has 10 different scales between u = 2 
and CT = 30 with exponential growing increments. On the graphs it can be seen that 
high scales makes the estimate of the Li-norm inaccurate, i.e. the estimate become too 
small. The reason for this inaccuracy is discretisation effects introduced by the outer 
scale of the image. 
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Fig. 2. We have calculated the invariants LiLi 
of the garden image from Fig. 3. These graphs show the Li-norm of and 

(/("))^/" of these 3 scale-space invariants. The solid lines corresponds to LiLi, the 
dashed lines to LijLji and the dotted lines to LijkLkji- The estimated slopes are for 
7 = 1.13,1.10,1.12 (n = 1,2,3), and for 7 = -0.87,-0.95,-0.96 

(n = l,2,3). 



large enough the outer scale of the large structures will exceed the outer scale of 
the image thereby misleading our method. Furthermore our results are biased by 
spectral leakage, because artificial periods is introduced into the images by the 
Fourier Transform. It can also be seen from Table 2 that when we increase the 
order of differentiation we also increases the accuracy of the method. The reason 
for this is that when we derive our image we enhance the fine structure of the 
image by effectively looking at a scale interval, which has been moved towards 
smaller scales. In real examples, image noise from the capture device will exhibit 
another structure than the random process of the scene. In general this is more 
uncorrelated noise, and a scale interval of smaller scales will exhibit structure 
merely from the capture device. That is, we must choose an appropriate scale if 
we wish to measure scale characteristics. 

We expect a logarithmic relation between the scale and From Table 

2 and Fig. 1 we can see that our proposed method for normalisation is quite 
reasonable for synthetic images. In order to examine our method on real images, 
we have calculated the Li-norm of invariants of increasing order of differentiation 
of the garden image from Fig. 3. We have normalised the calculated invariants by 
which corresponds to our normalisation method, and which 

corresponds to the standard normalisation method, in order to examine the 
scaling property of the image independently of the order of derivation. The 
slope of the logarithmic plot corresponds to 7 and we would expect that this 
slope should be approximately the same for all orders of derivation only for our 
normalisation method The results can be viewed in Fig. 2. From this 

figure it can be concluded that our normalisation method seems as a reasonable 
choice, but we can also see that the 7 of the standard normalisation method 
for this image is fairly independent of the order of derivation. This inconclusive 
experiment therefore calls for a thorough evaluation of the scaling properties of 
a large ensemble of images of natural scenes. 
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Table 3. This table show our estimated 7 values and the corresponding H and Haus- 
dorff dimension Dh- The values of H were calculated through H = n+ ^— 7— | 
and the values of Dh were calculated through Dh = N + {1 — H). The dimension of 
the images were N = 2 and they were differentiated by LijLji (n = 2). The 7 values 
were estimated in the same fashion as in table 2 and the images used can be seen in 
Fig. 3. The estimated values of Dh indicates the same results as Bialek [6] that the 
Hausdorff dimension of images of natural scenes not necessarily are close to Dh = 2.5. 
Unfortunately we have no way of determining the error on the results in this table. 



Title 


Estimated 7 values 


H 


Dh 


Garden 


-1.90 


0.60 


2.40 


X-rayed bone 


-1.62 


0.88 


2.12 


Water Lilly 


-1.53 


0.97 


2.03 


Sea weed 


-1.98 


0.52 


2.48 


Grains of sand 


-2.09 


0.41 


2.59 


Satellite clouds 


-1.89 


0.61 


2.39 


Landscape 


-1.75 


0.75 


2.25 


Trees 


-1.82 


0.68 


2.32 



We have also tried to estimate the Hausdorff dimension of some 2-D images 
of natural scenes. The results can be viewed in table 3. 



5 Conclusion 

We have related Lindeberg’s [15,16,17] scale-space normalisation method to the 
notion of fractal dimension, assuming that images of natural scenes can be mod- 
elled by fractal Brownian motions and we propose that feature strength measures 
are normalized using the Hausdorff dimension of the local image. 

We have found a normalisation expression that has the Hausdorff dimension 
as a parameter. Through this expression we have also found a relation between 
the topological dimension and the fractal dimension of the local image round a 
feature (see Table 1). We conjecture (for future experimental testing): 

The topological dimension of the feature uniquely determines the scale- 
space normalisation parameter. 

We propose a further investigation into the relation between different fea- 
tures and their Hausdorff dimension. It would be interesting to see whether it 
is possible to generalize the results described in table 1 and further establish a 
general relation between the topological dimension of features and the fractal 
dimension locally in the image. Furthermore we suggest a thorough investigation 
of the scaling properties of images of natural scenes using a large ensemble of 
images. 






The Hausdorff Dimension and Scale-Space Normalisation of Natural Images 



Fig. 3. Here we have shown the images for which we have estimated the fractal dimen- 
sion (see table 3). All images are gray level images and the hrst six images all have 
256 X 256 pixels and the last two have 512 x 512 pixels. We have called the images from 
the top left corner going in the reading direction; Garden, X-rayed bone, Water Lilly, 
Sea weed. Grains of sand. Satellite clouds, Landscape and Trees. 
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Abstract. The lattice Boltzmann method has attracted more and more 
attention as an alternative numerical scheme to traditional numerical 
methods for solving partial differential equations and modeling physical 
systems. The idea of the lattice Boltzmann method is to construct a 
simplified discrete microscopic dynamics to simulate the macroscopic 
model described by the partial differential equations. In this paper, we 
present the lattice Boltzmann models for nonlinear diffusion filtering. 
We show that image feature selective smoothing can be achieved by 
making the relaxation parameter in the lattice Boltzmann equation be 
image feature and direction dependent. The models naturally lead to the 
numerical algorithms that are easy to implement. Experimental results 
on both synthetic and real images are described. 



1 Introduction 

Broadly speaking, there are two ways to use computers to make progress in un- 
derstanding physical phenomenon. The first approach is to use computers as a 
tool to solve the partial differential equations (PDE’s) that describe the macro- 
scopic model. In this approach, the computers are used to treat the mathematical 
equations not directly the physical phenomenon. As the equstions become more 
and more complicate, the task for solving these equations becomes more and 
more diffficult. The second approach is to use computers as a kind of experimen- 
tal laboratory, to simulate the phenomenon of interest. The idea is to design a 
synthetic model in which the physical laws are expressed in terms of simple local 
rules on a discrete space-time structure. Such models include so called lattice gas 
automata (EGA) and the more recent lattice Boltzmann (LB) models in fluid 
dynamics. 

In lattice gas and lattice Boltzmann models, particles hop from site to site on 
a lattice at each tick of a clock. When particles meet they collide, but they always 
stay on the grid and appropriate physical quantities are always conserved. The 
long time evolution of this discrete microscopic dynamics is able to reproduce 
the complex physical phenomena investigated. The advantage of lattice gas and 
lattice Boltzmann methods is that they provide insight into the microscopic 

* The research of the authors was supported by ARO Grant DAA HO 49610326, ONR 
Grant N00014-90-J1343, and DEPSGoR Grant N00014-97-10806. 
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process and easily implemented highly parallel algorithms and that they have 
the capability of handling complicated boundary and initial conditions. 

The use of the lattice Boltzmann method has allowed the study of a broad 
class of systems that would have been difficult by other means. An example is 
flow through porous media [4]. In recent years, the adoption of the Bhatnagar- 
Gross-Krook (BGK) collision operator [10] in the lattice Boltzmann calculations 
has made the lattice Boltzmann model computationally more efficient. 

In this work we apply the lattice Boltzmann method to image processing, 
especially to nonlinear anisotropic diffusion of images. Anisotropic diffusion has 
been extensively used as an efficient nonlinear filtering technique in image pro- 
cessing. A considerable amount of research has been done in this area during the 
last decade [1,2,3,9,5,6,12]. For a complete list of references and an overview of 
nonlinear diffusion filtering see [13]. 

In this paper, we report the lattice Boltzmann models presented in [7] for 
nonlinear diffusion filtering. We also present a lattice Boltzmann model for image 
smoothing by reaction-diffusion. We show that image feature selective smoothing 
can be achieved by making the relaxation parameter in the lattice Boltzmann 
equation be image feature (e.g., edge) and direction dependent. 

The paper is organized as follows. In Section 2 we give a brief introduction 
to the general lattice Boltzmann model. Section 3 describes the lattice Boltz- 
mann model for nonlinear isotropic diffusion filtering. In Section 4, we discuss 
the lattice Boltzmann model for anisotropic diffusion filtering. Next, in Section 
5 we present the lattice Boltzmann model for reaction-diffusion filtering. Exper- 
imental results are shown in Section 6. We conclude with a summary in Section 
7. 

2 The General Lattice Boltzmann Model 

In general, lattice Boltzmann model is built with a lattice together with the 
lattice vectors Gq, (a = 0, 1, • • • , 6). On each node there are a set of particle 
distribution functions {/a}(a = 0, 1, •••,6), with each fa corresponding to 
the vector direction Gq,. Gq, can be considered as the particle velocity. Usually 
Go denotes the rest particles. The microscopic dynamics consists of two steps: 
translation from node to node along direction g^ and redistribution of the parti- 
cle density at each node during the collision step. These two steps are described 
by the following lattice Boltzmann equation 

/a(x-}-Ga, t-k 1) = /a(x, t) -k I2a(x, t), (o = 0, 1 , • • • , 6) , (1) 

where I2 q;(x, t) is the collision operator which depends on the distribution func- 
tions fa- 

Usually, the only restrictions on the collision operator 17 q,(x, t) are that it 
conserves mass, 

b 

f2a(x, t) = 0, 

a=0 



(2) 
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and that it conserves momentum, 

b 

^ e„f7a(x, t) = 0. (3) 

a— 0 

The mass m(x, t) (or other quantity for different problem of interest) at 
position X and time t is given by 



b 

t) = t) ■ ( 4 ) 

a— 0 

By adopting the simple lattice-BGK model [10], the collision term in 
(1) can be replaced by the single-time-relaxation approximation, [2 a = —{fa — 
and (1) becomes 

/a(x + ea,t+l)= /a(x, t) - ^ (« = 0, 1, • • • , 6) , (5) 

T 

where fff{x, t) denotes the appropriately chosen local equilibrium distribution 
functions, and t is the relaxation time which controls the rate of approach to 
equilibrium, r must be chosen greater than 1/2 to ensure numerical stability 
[ 10 ]. 

On one hand, the lattice Boltzmann models can be used as PDE solvers. 
By chosing appropriate collision operator or equilibrium distribution, the lattice 
Boltzmann model is able to recover the PDE of interest. On the other hand, 
the lattice Boltzmann models can be used as simulators. By specifying the mi- 
croscopic collision rules, the lattice Boltzmann model can directly simulate the 
under-investigated phenomena. The microscopic approach in the lattice Boltz- 
mann model provides clear physical pictures, easy implementation of boundaries 
and fully parallel algorithms. 

3 LB Model for Nonlinear Isotropic Diffusion Filtering 

For two-dimensional discrete image of size M x M, the image domain can be 
naturally considered to be a square lattice. In this paper, we use the 9-velocity 
model for square lattice with the velocity vectors, 

{ (0,0) a = 0 

(cos[27r(a — l)/8], sin[27r(a — l)/8]) a = 1, 3, 5, 7 (6) 

v^(cos[27r(a — l)/8], sin[27r(a — l)/8]) a = 2, 4, 6, 8, 

where eg corresponds to the rest particles which have speed 0. Here the quantity 
that we are intersted in is the image intensity /(x, t) instead of the mass at 
position X and time t. Parallel to the general LB model described in Section 2, 
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we have at some time t and position x the particle distribution functions /q(x, t) 
(a = 0, 1, • • • , 8), which can be imagined as the amount of the intensity moving 
into the direction e^. Corresponding to (4), we have 

8 

t) t). (7) 

a— 0 

We will use the lattice-BGK model (5). In order to achieve the selective smooth- 
ing of image, we make the relaxation parameter r to be space, time, and image 
feature (edge) dependent, i.e., 

r(x, t) = 0(|VG,*/|), (8) 

where 4> is some positive nonincreasing function, and 

is the Gaussian kernel. 

We choose the local equilibrium distribution functions 7„'^(x, t) as follows: 

t) = Cal{y^, t), (10) 

where the distribution factors Cq,’s are defined by 

i « = o 

i 0=1, 3, 5, 7 (11) 

^ o = 2, 4, 6, 8. 




Note that = 1- 

The evolution of (a = 0, 1, • • • , 8) is then governed by the following lattice 
Boltzmann equation 

/^(x + e^, t + 1) = /^(x, t) - , (o = 0, 1, •••,8). (12) 

0(|VG,j * I\) 

From Section 2 we know that the relaxation parameter t must be greater than 
1/2 to ensure numerical stability. Therefore the nonincreasing function (p must 
be chosen such that (j){\VG„ * I|) > 1/2. One possible choice of (j){\VG„ * /|) is 



</(|VG,*/|) 



1 

2 



— — for some positive constant G. 

l - k | VG ,^*/|2 



(13) 



Using the so-called Chapman-Enskog expansion we showed in [7] that the 
long time behavior of the LB model described above recovers the following type 
of nonlinear isotropic diffusion equation 



9t/ = div( 5 (|VG,*/|)V/). 



(14) 
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Equation (14) was proposed in [2] as an improvement of the Perona-Malik model 
[9] . In [2] and [9] the function g is chosen as some positive nonincreasing function 
vanishing at infinite. 

The relation between the diffusion coefficient g in the diffusion equation (14) 
and the relaxation parameter (j) in the LB equation (12) is 

g(|VG.*/|)= ^(<(.(|VG.*/|)-^). (15) 



4 LB Model for Anisotropic Diffusion Filtering 



In order to achieve anisotropic diffusion, we need to make the relaxation param- 
eter r also depend on the direction vectors (a = 0, 1, • • • , 8), 

t, a) = * I). (16) 

We can use the same equilibrium distribution functions as in Section 3, 

t) = Ca/(x, t), (17) 



where the c^’s are the same as in Section 3. But if we still use the same LB 
equation (12), the total mass (intensity in our case) will not be conserved. To fix 
this problem, the natural approach is to choose different equilibrium distribution 
functions; an alternative way is to simply add another term to the equation [8]. 
We will follow the approach in [8]. In this case the lattice Boltzmann equation 
becomes 



Ia{x + ea,t+l) = Ia{x, t) 



/a(x, t) - J^g(x, t) ~ 

KiyG. * I) + “ 



where 



8 

^ ^ 

/ 3-0 



I/six, t) - t) 

<l>0{VG^*I) ’ 



which is added in equation (18) for mass (intensity) conservation. 
As in Section 3, we also have 



(18) 



(19) 



8 

^(x, t) = ^/a(x, t). 

ct—0 



( 20 ) 



The relaxation parameter functions (j)a{VGa * I) (a = 0, 1, • • • , 8) in (18) 
are some nonincreasing functions of some kind of quantities of VG^ * I. Again, 
4>a{VGa * I) must be greater than 1/2 to ensure numerical stability. In order to 
keep with the symmetry of the lattice (gq, = Gci+a), we will require </>„ = (j)a+A- 
One possible choice of /)q(VGct * I) is 



</„(VG,*I) 



1 G 

2 + l+|(e„, VG,,*/)|2 



for some positive constant G. (21) 




288 B. Jawerth, P. Lin, and E. Sinzinger 



In [7] we showed that the long time evolution of /(x, t) = 
according to the LB equation (18) recovers the following nonlinear anisotropic 
diffusion equation 

at/ = div(£iV/), (22) 

where D is the diffusion tensor with 

8 1 

^ ^ (^g(^f^cr * 7) • (^^) 

a— 0 

Nonlinear anisotropic diffusion filtering by equation (22) with different dif- 
fusion tensors has been studied in [12] and [3]. Some applications have been 
presented in [11]. 

Equation (21) only gives one possible choice of the relaxation parameter 
4>a{'^Ga * !)■ For different purpose of filtering, one can use different form of 
0„(VG,,*/). 

5 LB Model for Reaction-Diffusion Filtering 

In this section we propose the lattice Boltzmann model for reaction-diffusion 
filtering. 

For diffusion based filtering, one can also use reaction-diffusion equation in- 
stead of pure diffusion equation: 

dtl = div(g(]VG, * /])V/) + - I) (24) 

where /q is the original image. The advantage of adding a reaction term is that it 
provides a nontrivial steady state, therefore eliminates the problem of choosing 
a stopping time in using pure diffusion equation. But the trade off is that one 
has to determine fi. 

In this section we use the same notations as those in Section 3. To get the 
LB model for the reaction-diffusion equation (24), we simply add another term 
to equation (12) resulting in the following LB equation: 

/„(x + e„, t+1) =/„(x, ^^ +c,,7(7o(x, t)-/(x, t)) (25) 

<P(|VG,^ * 7|) 

where ^(jVGo- * 7j) is the same as in Section 3 and 7 is a parameter controlling 
the reaction speed. As in Section 3 and 4, we use the LB equation (25) to update 
7 q (a = 0, 1, • • • , 8), use 

8 

t) = t) (26) 

ct— 0 

to update I, and use 

t) = Cal{x, t) (27) 

as the equilibrium distribution functions. 

Lfsing a similar procedure as in [7], one can derive the macroscopic equation 
(24) from the lattice Boltzmann equation (25) and equation (26). 
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6 Numerical Experiments 

The implementations of our LB models are straightforward. Once an initial image 
/(x, 0) is given, we use Cq/(x, 0) (a = 0, 1, • • • , 8) as the initial values for the 
LB equations, where Cq’s are given by (11). The relaxation parameters ^(| VGo- * 
/|) and ^c(VGcr * I) are calculated using the corresponding equations given in 
Section 3 and 4 respectively. The equilibrium distribution functions t) 

are calculated using equation (10). {a = 0, 1, • • • , 8) are updated by the 
LB equations (12), (18), and (25) for different models in Section 3, 4, and 5 
respectively. After getting the updated la, I is updated by equation (7). Then 
start the next iteration. 

Figure 1 shows a synthetic image (256 x 256) with 35% of the pixels are de- 
graded and its “cleaned” version by the LB model in Sction 3 with ^(|VGct*/|) = 
0.5 -I- 50/(1 -I- |VGcr * /p) (cr = 1) and 60 iterations. Figure 2 shows the same 
synthetic image with 70% of the pixels are degraded and its “cleaned” version by 
the LB model in Section 4 with (j>a{'VGa- * I) = 0.5 -I- 25/(1 -I- | (gq, VG^r * I) P) 
(cr = 1) and 90 iterations. Figure 3 shows an enlarged detail (256 x 256) of 
an original infrared airborne radar image and its processed version by the LB 
model in Section 3 with (/(|VGcr * /|) = 0.5 -I- 25/(1 -I- |VGcr * /p) (cr = 1) and 
12 iterations. Figure 4 shows a part of an original airborne Doppler radar im- 
age (256 X 256) and its processed version by the LB model in Section 4 with 
0o,(VGcr * /) = 0.56 -I- 10/(1 -I- I (gq,, VGcr * I) P) (cr = 1) and 65 iterations. 




Fig. 1. A synthetic image (256 x 256) with 35% of the pixels are degraded (left) and 
its “cleaned” version by the LB model in Section 3 with 60 iterations (right). 
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Fig. 2. A synthetic image (256 x 256) with 70% of the pixels are degraded (left) and 
its “cleaned” version by the LB model in Sction 4 with 90 iterations (right). 




Fig. 3. An enlarged detail (256 x 256) of an original infrared airborne radar image 
(left) and its processed version by the LB model in Section 3 with 12 iterations (right). 





/V-^ 
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Fig. 5. An original image (256 x 256) with added Gaussian noise with cr = 30 (left) 
and its processed version by the LB model in Section 5 (renormalized) (right). 



4. A part ot an original airborne Doppler radar image (256 x 256 
rocessed version by the LB model in Section 4 with 65 iterations (re 
tl. 
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Figure 5 shows an original image (256 x 256) with added Gaussian noise with 
cr = 30 and its processed version by the LB model in Section 5 with ^(|VGcr * 
/|) = 0.5 + 30/(1 + \VG„ * Ip) (cr = 1) and 7 = 0.03. 

7 Concluding Remarks 

In this paper, we have described the lattice Boltzmann models for nonlinear 
diffusion filtering. We have shown that image feature selective smoothing can be 
achieved by making the relaxation parameter in the lattice Boltzmann equation 
be image feature (e.g., edge) and direction dependent. The advantage of the 
lattice Boltzmann model is that it provides insight into the microscopic process 
and easily implemented highly parallel algorithms. We believe that the lattice 
Boltzmann method is also very helpful in exploring new models. By choosing 
different equilibrium distribution in the lattice BGK model or more generally 
choosing different collision operator in the lattice Boltzmann model, one is able 
not only to recover some PDF’s but also to give new image processing models. 
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Abstract. We merge techniques developed in the Beltrami framework to 
deal with multi-channel, i.e. color images, and the Mumford-Shah func- 
tional for segmentation. The result is a color image enhancement and 
segmentation algorithm. The generalization of the Mumford-Shah idea 
includes a higher dimension and codimension and a novel smoothing mea- 
sure for the color components and for the segmenting function which is 
introduced via the T -convergence approach. We use the T -convergence 
technique to derive, through the gradient descent method, a system of 
coupled PDFs for the color coordinates and for the segmenting function. 



1 Introduction 

Segmentation is one of the important tasks of image analysis and much efforts 
have been consecrated to solve it. One can roughly classify the segmentation 
methods into two classes: 1) Global, i.e. histogram based techniques, and 2) Lo- 
cal, i.e. edge based techniques. In the second class it was shown that a large 
number of algorithms, including different region growing methods coupled with 
edge detection based techniques, are closely related to the Mumford-Shah func- 
tional minimization [11]. This functional involves an interplay between an image, 
which is a two dimensional object, and the contours that surround the objects in 
the image, which are one-dimensional curves. This functional was first suggested 
and analyzed by Mumford and Shah for gray-level images in [12]. It was later 
extensively studied, see e.g. [11] for an overview. 

In particular, the T-convergence framework [1,2,3,15] was invented to over- 
come the problem of dealing with objects with different dimensionalities in the 
same functional. In the T-convergence framework, one replaces the functional 
by a different, parameter dependent, functional. The parameter controls the de- 
gree of approximation, such that the approximating functional is equal to the 
Mumford-Shah functional in the limit, as the parameter goes to zero. In the 
approximating functional, the edge contours are replaced by a two-dimensional 
function which is close in shape to an edge indicator with certain smoothness 
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around the edges. The degree of smoothness depends on the approximation pa- 
rameter, and the function approaches a Dirac delta function for the edges, as 
the approximation parameter approaches zero. 

In this study we address the question of the generalization of this approach to 
color images. Methods that disregard the coupling between the spectral channels 
give up important information given by the correlation between the color chan- 
nels. Moreover, there is an underlying assumption in the Mumford-Shah model 
of the smoothness of the image in the non-boundary regions, which is formulated 
through an L 2 measure. It is known, though, that the L\ performs better as an 
adaptive smoothing measure [17]. It is desirable, therefor, to incorporate the L\ 
norm or another adaptive smoothing scheme in the Mumford-Shah formulation 
for the segmentation problem. Recently, it was shown [19] that the Beltrami 
framework provides a proper generalization of the L\ norm from gray-level to 
color images. 

In the Beltrami framework, an image is treated as a two-dimensional Rieman- 
nian surface, restricted as a graph, embedded in a higher dimensional spatial- 
feature space. A grey-level image is embedded in IR^ whose coordinates are 
(x,y,I) and it is simply the graph of the intensity function I{x,y). Similarly, 
a color image is embedded in a five-dimensional space whose coordinates are 
(x,y,R,G,B). The induced metric of these surfaces is easily extracted and a 
measure, known as the Polyakov action in high-energy physics, is used as a 
generalization of the L 2 norm to any dimension and codimension, and for any 
geometry of the surface and of the embedding space. We and others have shown 
that this “geometric L 2 ” norm interpolates via a scaling parameter between the 
conventional, i.e. flat L\ and L 2 norms for gray level images. It interpolates, for 
color images, between the flat L 2 and a different norm, which is interpreted as 
the proper generalization of the Euclidean Li norm for color images [9,6]. 

Our current study merges the T-convergence technique and the Beltrami 
framework for color images to yield a color and smoothing generalization for the 
Mumford-Shah segmentation functional. 

The paper is organized as follows: In Section 2 we briefly review the F- 
convergence and its application for the gray-level image segmentation. Section 3 
reviews the Beltrami framework. We present, in Section 4, our color segmentation 
functional and derive a non-linear coupled Partial Differential Equations (PDE) 
as gradient descent equations for this functional. Results are presented in Section 
5, and we summaries and conclude in Section 6. 

2 /^-Convergence Formulation 

The Mumford-Shah functional includes three terms: A fidelity term, a smoothing 
term, and a penalty on the total length of the discontinuities. Let 

F[I,K]= f {a{I - lof + f3\VI\^) dxdy + n{K) (1) 

JO\K 

where Iq is the observed image, I is the denoised image, 17 is the images domain, 
and K is the set of discontinuities. The Hausdorff measure TL{K), measures the 
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total length of the discontinuity set. The implicit assumption that underlies 
this functional is that an image is a piecewise smooth function. The first term 
penalizes a function that differs from the observed one, the second term penalizes 
large gradients, and the last term penalizes excessive use of segmentation curves. 
The minimizer places the segmenting curves along the most significant gradients 
and tries to smooth the function everywhere else without diverting too much 
from the original image. The parameters a and j3 control the relative weight of 
the three terms. 

It is difficult to minimize this functional numerically because of the large 
number of possibilities of placing the set of boundaries K inside 12. In order to 
have a better control of the problem, both mathematically and numerically, it 
is convenient to approximate the functional. In the T-convergence framework, a 
new functional is proposed [2] in the form 

F,[I,E]= j {a{I-h? + PE'^\VI\^ + c\VE\^ + xl;,{E))dxdv, (2) 
JO 

where, ideally the function E{x,y) is an edge indicator, such that E{xo,yo) = 0 
when an edge passes through (xo,yo) and E(x, y) = 1 otherwise. In this case, the 
second term in the approximating functional is identical to the second term in 
the Mumford-Shah functional. In fact, we demand that the segmenting function 
if is a smooth function and use the L2 norm to penalize discontinuities in E. 
The last term is constructed in such a way that it forces E to behave as an 
edge indicator, i.e. it pushes E to \ far from an edge. In the vicinity of an edge, 
the term E‘^\VI\^ pushes E to zero. Explicitly, Ambrosio and Tortorelly have 
chosen: 

F,[I, E] = (^a{I - hf + /3A2|V/p + c|VAp + dxdy. (3) 

One can show that in the limit as c — >■ 0, the functional Fc[I, E] approaches 
F[I,K] such that the minimizers of Fc converge to the minimizer of F. 

One can naturally envisage using a different norm, i.e. Li norm for the gra- 
dients of the denoised image and the segmenting function. The question is how 
to extend this idea for a color image. 

3 The Polyakov action 

Let us introduce a geometric viewpoint that enables us to generalize an adaptive 
smoothing algorithm to a higher dimensional and codimensional images. 

There is an extensive literature on functionals of the type 

F[I] = J dxdyp{\VI\) = J dxdyp + /2 ^ ^ (4) 

where p{s) is a function which has a lower bound. We suggest to generalize it in 
the following way: 

F[I, a, b,c] = J dxdy f {a, b, c)p (^^all + 2bIJy + cl^ ^ , (5) 
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where a, b, c and / are functions of x and y, and / is positive definite. The 
interpretation of this generalization is geometric. Images are viewed as embed- 
ding maps. Let us consider the important example X : L" — >■ Denote the 

local coordinates on the two-dimensional manifold S by (cr^, ct^), these are anal- 
ogous to arc-length for the one-dimensional manifold, i.e. a curve. The map X 
is explicitly given by 

= cr^, X^(cr^, cr^) = (T^, (6) 

Since the local coordinates cr* are curvilinear, the squared distance is given by 
a positive definite symmetric bilinear form called the metric whose components 
are denoted by g^i/(cr^, ct^), 

= 9ti^da^da'' = gn{da^Y + 2gi2da^da'^ + 522 (^ 0 -^)^, (7) 

where we used Einstein summation convention in the second equality; identical 
indices that appear one up and one down are summed over, see [4,18] for a short 
introduction to tensor calculus and covariance in the context of image analysis. 
We denote the inverse of the metric by {g^'^), and its determinant by g. 

The Polyakov action is a generalization of ^ 2 - It depends on both the image 
manifold and the embedding space. Denote by (E,(g^^)) the image manifold 
and its metric and by (M, (hij)) the space- feature manifold and its metric. We 
choose p(|s|) = s ■ s = s^s^hij, then the map X : X — >■ M has the following 
weight [13] 



F[X\g^,,K,\ = j d^ayfgg^^''{d^X^){d,X^)Kj{X), (8) 

where m is the dimension of S and the range of indices is g,,v = 1, . . . , dim X, 
and f, j = 1, . . . ,dimM. In the above expression dEa^/g is a volume element of 
the image manifold. The rest, i.e. g^'' {d^X'^){duX^)hij{X), is a generalization of 
L 2 . It is important to note that this expression, as well as the volume element, 
do not depend on the local coordinates one chooses. 

For our example in Eq. (6), we assume a diagonal form for the embedding 
space, i.e. hij{x,y,I) = fi{x,y, I)Sij (no summation over indices here). We get 
the following functional 

F[I, gf,^] = j dxdyy/g (^“/i -k + {g^^ll + 2g^^lxly + g^^Iy)f3) (9) 

which is reduced, up to terms independent of I, to the form of the functional in 
Eq. (5) when the /j’s are constants. 

The minimization of F with respect to the metric can be solved analyticly, 
for two-dimensional manifolds. The minimizing metric is the induced metric of 
the isometric embedding. Explicitly, it is given in terms of the embedding map 
and the metric of the embedding space. 



9^.u{cr\a^) = h,y{X){d^X%d,X^). 



( 10 ) 
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Using standard methods in variation calculus, the Euler-Lagrange (EL) equa- 
tions with respect to the embedding are (see [18] for derivation) 

1 

- ( 11 ) 



where the operator that is acting on X'^ in the first term is the natural gener- 
alization of the Laplacian from flat spaces to manifolds and is called the second 
order differential parameter of Beltrami [10], or in short Beltrami operator. It is 
given in term of the metric as 



V 9 



(12) 



In the second term of Eq. (11), the are the Levi-Civita connection’s coeffi- 
cients with respect to the metric hij that describes the geometry of the embed- 
ding space [23] 

+ dkhji - diffk). (13) 

This term is in particular important in color image analysis and processing since 
some of the models of color perception assume non-Euclidean color space. 

We view scale-space as the gradient descent. 



dX^ 



' -h- 



— 

dt 2ffg"^ 6X^' 



(14) 



Notice that we used our freedom to multiply the Euler-Lagrange equations by 
a strictly positive function and a positive definite matrix. This factor is the 
simplest one that does not change the minimization solution while giving a 
reparameterization invariant expression. This choice guarantees that the flow is 
geometric and does not depend on the parameterization. 

Choosing the induced metric and minimizing the feature coordinates results 
in a system of coupled partial differential equations that describe the flow of 
the image surface inside the spatial-feature space. This flow has the effect of 
smoothing more rapidly areas between edges than the edges themselves. This 
effect is achieved by the projection of the mean curvature vector to the feature 
space. Since normals to the surface at an edge lie almost entirely in the spatial 
space, their projection to the feature space is small and does not change the 
value or location of an edge. 

This technique was used to denoise and enhance a variety of gray-level, color, 
3D images, like movies, and volumetric medical images, and texture [7,19,18]. 
Next, we show that it is a useful measure in color image segmentation. 



4 Color Segmentation Functional 

According to the Beltrami framework [19], a color image is represented as an 
embedding map of a two-dimensional Riemannian manifold in a five-dimensional 
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spatial-color Riemannian manifold. The coordinates of the two-dimensional man- 
ifold are (ct^, ct^), and those of the five-dimensional one are 
X^, X^, X'^, X®). The embedding map is 

{X^ =a\X^ = a^, X^ = R{a\a^),X‘^ = G{a\a‘^),X^ = B{a\a‘^)}. (15) 

We identify X^ with x and X^ with y and by abuse of notations we write 
{x, y, R{x, y),G{x, y),B{x, y)}. We also use below the notation P for i =r,g,b to 
denote the different color channels. 

The metric of the embedding space is 

= dx'^ + dy'^ + dsl^i^^, (16) 

where the metric in the color space is model dependent, see [23] for a general 
discussion and [20,21] for the analysis of different color models in the Beltrami 
framework. We choose, for sake of simplicity, to adopt a Euclidean metric for 
the color space, see [24] for a related effort. 

Two different approaches are possible in the treatment of the segmenting 
function. We can think of it as a function on the image manifold or as a function 
on the spatial part of the embedding space. The two approaches lead to some- 
what different equations even though the spatial part and the image manifold 
coordinates are identified in the embedding map. 

4.1 Segmenting function on the image manifold 

The metric in the image manifold is given by the induced metric (see Eq. 
(10)). We assume further that the segmenting function is defined over the two- 
dimensional image manifold, see Figure 1. We use the Polyakov action as an 
adaptive smoothing metric for both the color coordinates and the segmenting 
function. The functional we propose reads 

= d^a^ (f - X*)h„ (X)(X^' - X(])+ 
^E{a\a^rg^’'{d,X^){d,X=)h.,{X) + ^g^-'{d,E){d.E) + (17) 

We take the color metric to be the unit matrix hij = 6ij from now on. We 
minimize this functional by the gradient descent method. Formally, the equations 
are 

, ^ ^ 

* ~ dt y/g SB 

dE _ SF 
~SE' 

The functional variations yield the following explicit partial differential equa- 
tions 

n = PE^AgiP) + (3g^’'{d^E){d.F) - a{P - I^) 

Et = -,/g{2(dE - - cAg{E)) 
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where i = [R, G, B], and 



MX) 



1 

V9 






is the Laplace-Beltrami operator on the image manifold. The factor 2 in the first 
term of the equation for E comes from the choice of the metric as the induced 
one given in Eq. (10). We find that 

= g^^^'g^, = Tr(Id 2 x 2 ) = 2. (18) 

The first term in the equation for I smoothes the function when E = 1 and is 
ineffective around an edge when E approaches zero. The second term sharpens 
gradients and create shocks. The last term pushes I towards Iq. 




Fig. 1. Left is the edge indicator fnnction E defined over the image plane {x, y}. Right: 
the edge indicator function E defined on the image surface manifold {x,y, I{x,y)}- 



4.2 Segmenting function on the embedding space 

The metric is, as before, the induced metric but this time the Polyakov action 
is used only for the feature coordinates. The segmenting function is defined over 
the Euclidean spatial part of the embedding space and therefor it is smoothed 
using the usual L 2 norm. The functional, in this case, is 

FS\I\I^E] = f -XM 

t/ ^ 

^E{x,y)^g^''{d^X^){d.XMj{X)'^ + J dxdy (^\V E\M (19) 

The gradient descent equations are 

II = PE^AgiP) + M''{d^E){d,P) - a(P - ID 
Et = —2f3^/gE H — h cA{E), 

where i=[R,G,B], and A{E) is the usual Laplacian. The first term in the equa- 
tion for E decreases the values of E for large g. The second term of the equation 
pushes the values of E toward 1, as c approaches zero. The last term is a smooth- 
ing term. 
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5 Experimental results 




Fig. 2. Upper row, left to right: The original noisy image, followed by the hnal edge 
indicator fnnction E, and the hnal image. At the bottom are zoom-in frames of a sqnare 
section cropped from the initial and hnal images. 



We tested both cases, where the segmentation function E is defined on the 
image manifold and then on the embedding space. The time derivatives are ap- 
proximated by an explicit forward numerical approximation (Euler scheme) . The 
spatial derivatives were taken first by forward followed by backwards approxi- 
mation, see [17]. This is a simple way to keep the numerical support tight and 
centralized. The examples demonstrate color image enhancement for both noisy 
and clean images. In all examples we set a = 7 • 10 “^, (3 = 2- 10“"^, c = 10“^. We 
also decreased the value of c along the iterations by setting = c"/1.002. as 
proposed in [15]. 

In Figure 2 we use the segmentation function E on the image flat manifold. 
The embedding space was taken Euclidean in color space. Figure 3 tests the seg- 
mentation function E on the embedding space. This example takes a clean bench- 
mark image into a piecewise smooth one. Here the embedding space is based on 
Helmholtz’s arclength in color ds^oJor = (dlogi?)^ -I- (dlogG)^ -I- (dlogH)^, see 
also [5,22,23,20]. In some cases the edges appear as ‘edge regions’ rather than 
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Fig. 3. Upper row, left to right: The original image, followed by the final edge indicator 
function E, and the final image. At the bottom are zoom-in frames of a square section 
cropped from the initial and the final images. 
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one dimensional curves as expected. The reason is our numerical approximation 
for E. We use an edge image with the same resolution as that of the original im- 
age, adding central difference approximation yield the edge regions. One possible 
solution is to apply the refined numerical approximation to the edge map as in 
[15]. Finally, in Figure 4 we apply the segmentation function E in the embedding 
space, to a noisy image. The source of the noise comes from a digital camera 
compression distortions, followed by a scanned version of a printout picture. 




Fig. 4. The original noisy image is on the left, followed by the edge indicator field E, 
and the final result. Bottom line shows a zoom in on the original noisy on the left and 
filtered image. 



6 Summary and Conclusions 

We presented a generalization of the Mumford-Shah enhancement and segmen- 
tation method. The generalization is in two aspects: Multi-channel images, i.e. 
color images are analyzed, and the L 2 measure is replaced by the Polyakov ac- 
tion. The generalization is a natural application of the Beltrami framework that 
represent images as an embedding map of the image manifold in a spatial-feature 
space. 
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Abstract. We present a supervised classification model based on a vari- 
ational approach. This model is devoted to find an optimal partition com- 
pound of homogeneous classes with regular interfaces. We represent the 
regions of the image defined by the classes and their interfaces by level 
set functions, and we define a functional whose minimum is an optimal 
partition. The coupled Partial Differential Equations (PDF) related to 
the minimization of the functional are considered through a dynamical 
scheme. Given an initial interface set (zero level set), the different terms 
of the PDE’s are governing the motion of interfaces such that, at con- 
vergence, we get an optimal partition as defined above. Each interface is 
guided by internal forces (regularity of the interface), and external ones 
(data term, no vacuum, no regions overlapping). Several experiments 
were conducted on both synthetic an real images. 



1 Introduction 

Image classification, which consists of assigning a label to each pixel of an ob- 
served image, is one of the basic problems in image processing. This concerns 
many applications as, for instance, land use management in teledetection. The 
classification problem is closely related to the segmentation one, in the sense 
that we want to get a partition compound of homogeneous regions. Neverthe- 
less, within the classification procedure, each partition represents a class, i.e. a 
set of pixels with the same label. In the following, the feature criterion we are 
interested in is the spatial distribution of intensity (or grey level). This work 
takes place in the general framework of supervised classification which means 
that the number and the parameters of the classes are known. The proposed 
method could be extended to other discriminant features than grey-level such as 
texture for instance. The unsupervised case, including a parameter estimation 
capability, will be studied in the future. 
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Many classification models have been developed with structural notions as region 
growing methods for example [9], or by stochastic approach as in [1], but rarely 
in the field of variational approach. In [11] we proposed a supervised variational 
classification model based on Cahn-Hilliard models, such that the solution we get 
is compound of homogeneous regions separated by regularized boundaries. The 
classes are considered as phases separated by interfaces boundaries. The model 
was developed through considerations of regularity on the phases by defining 
a set of functionals whose expected minimum at convergence is an image with 
expected properties of regularity. 

Herein, the approach is different, mainly because the proposed model is based 
on active contours, and the functional of interest is defined over the regions with 
associated interfaces through a level set model. The resulting dynamical Partial 
Differential Equations (PDE’s), governing the evolution of the set of interfaces, 
consist of a moving front converging to a regularized partition. This model is 
inspired by the work of Zhao et al. about multiphase evolution [13], and takes 
place in the general framework of active contours [2,3,7] for region segmentation 
[14] . We use a level set formulation [8] which is convenient to write functional de- 
pending on regions and contours, and allows a change of topology of the evolving 
fronts. Each active interface is coupled to the other ones through a term which 
penalizes overlapping regions (i.e. pixels with two labels) and the formation of 
vacuum (i.e. pixels without any label). The evolution of each interface is guided 
by forces which impose the following constraints : the interface exhibits a min- 
imal perimeter (internal force) and it encloses one and only one homogeneous 
class (external force). 

First, we state the problem of classification as a partitioning problem. We clearly 
set the framework and define the properties we expect on the classification. 
Second, we expose the classification statement through a level set formulation. 
The Euler-Lagrange derivative of the proposed functional leads to a dynamical 
scheme we propose to implement. We finally present some experimental results 
on both synthetic and real images (see also [10] for more experiments). 

2 Image classification as a partitioning problem 

This section is devoted to present the properties we want the classification model 
to satisfy. In the following, we consider a classification problem in which a parti- 
tion of the observed data Ug, with respect to the predefined classes, is searched. 
This partition is compound of homogeneous regions, say the classes, separated by 
regularized interfaces. Herein, we suppose that the classes have a Gaussian distri- 
bution of intensity, therefore a class is characterized by its mean fXi and its stan- 
dard deviation ai. The number K of classes and the parameters Ui)i=i...K 
are supposed to be given from a previous estimation. We choose to assign the 
label value Hi to each element of the i*^ class. All indexes i or j are going from 1 
to K. The proposed method is not limited to images in which the intensity ho- 
mogeneity is a good classifier. The same approach could be used to classify data 
according to a texture parameter for example, or other discriminant attributes. 
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Fig. 1. A partition of 



On can think of first computing the discriminant attributes on the observed 
data, then determine the classes using an algorithm giving the number of classes 
and their parameters. We can assume that the repartition inside each class is 
Gaussian and apply the proposed algorithm in order to determine a regularized 
classification. 

Let 17 be an open domain subset of K.^ with smooth boundary, and let uq : 17 — >■ R 
represent the observed data function. Let 17^ be the region defined as 

I7i = {x G l7/x belongs to the i*^ class}. (1) 

A partitioning of 17 consists of finding a set such that (see Fig. 1) 

K 

17 = (^ 17i and 17^ I7j = 0. (2) 

i=l 

We note Fi = dfii fl 17 the intersection of the boundary of 17^ with the open 
domain 17, and let the interface between 17^ and I7j be 

Fy = Fji = F, n Fj n 17, Vz j. (3) 

We have 

^^ = \JF,J■ (4) 

iA* 

Let remark that in (3) and (4) we eventually have F^ = 0. We note |Fi| the 
one-dimensional Hausdorff measure of Fi verifying 

|F,| = ^|F,,| and |0| = O. (5) 

iA* 

The classification model we consider for an image Uo defined over 17, is a set 
{17i|i defined by (1) and satisfying : 

Condition a : {17^}^ is a partition of 17 : 

17 = (^ 17i and 17^ 17j = 0. 

i 
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Condition b : The partition {f2i}i is a classification of the observed data uq and 
takes into account the Gaussian distribution property of the classes (data term) ; 



minimize 




2 



with respect to 17^. 



Condition c : The partition is regular in the sense that the sum of the length of 
interfaces is minimum ; 



minimize I with respect to G K are fixed). 

Li 

The solution of the classification model proposed in the next section have to 
take into account the three conditions. This is done by associating a functional 
to the set of interfaces such that minimizers will respect conditions a, b and 

C. 

3 Multiphase model : image classification in terms of 
level set 

The classification model developed further is based on coupled active interfaces, 
and the approach we adopt is inspired from Zhao et al. [13]. The evolution of 
each interface is guided by forces constraining the solution to respect conditions 
A, B and C exposed in the previous section. We use a level set formulation to 
represent each interface and also each region Qi element of the partition {f^i}i- 

3.1 Preliminaries 

Let <Pi : f2 X K.+ — >■ M be a Lipschitz function associated to region f2i (we assume 
the existence of such a <?i) such that 

{ <?i(x; t) > 0 if X G i7i 

^i{x;t) = 0 if X G Ti (6) 

< 0 otherwise . 

Thus, the region is entirely described by the function (Pi. In the following, for 
a sake of clarity, we will sometimes omit spatial parameter x and time parameter 
t in <?i(x; t). 

Let define the approximations Sa and Ha of Dirac and Heaviside distributions 
with a G K.“*" 



(5a (s) = 



|2t(i+cos(^)) ifk|<a 

\o if jsj > a 



(7) 
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Hc.{s) 





if |s| < a 
if s > a 
if s < —a 



and we have 



v n+ 

da — >■ 0 as a — >■ 0^ 
T>' (O) 

Ha H as a — >■ 0+ 



(8) 



where T> (17) is the space of distributions defined over 17. From (6), (7) and (8) 
we can write 



{x G 17/ lim Ha{^i{x]t)) = 1} = l7i 


(9) 


a->0+ 


{x G 17/ lim 5a{^i(,x] t)) ^ 0} = Fi- 

a->0+ 


(10) 



3.2 Multiphase functional 

Let uo : 17 — >■ K be the observed data (grey level for instance). 

Thanks to the level set <Pi’s defined in (6) and by the use of (9) and (10), a 
partition {17^}^ respecting conditions a, b and C stated in section 2 can be 
found through the minimization of a global functional depending on the l^i’s. 
This functional contains three terms, each one being related to one of the three 
conditions. In the following, we express each condition in term of functional min- 
imization. Minimizers of the following functionals are supposed to exist. 



• Functional related to condition a (partition condition) ; 
Let define the following functional : 









r ^ 

/ i7o,(^i) — 1^ with A G 

^ i=i 



( 11 ) 



The minimization of F^, as a — >■ O'*", penalizes the formation of vacuum (pixels 
with no label) and regions overlapping (pixels with more than one label). 



• Functional related to condition b (data term) ; 

Taking into account the observed data and the Gaussian distribution property 
of the classes, we consider : 



= '^Si [ Ha{^i) ^^° 2^"^ witheiGlR,VL 



( 12 ) 



The family {F}i minimizing F® as a — >■ 0+ leads to a partition {17^}^ satisfying 
CONDITION B. 
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• Functional related to condition c (length shortening of inter- 
face set) : 

The last functional we want to introduce is related to condition C about the 
minimization of the interfaces length. We would like to minimize 



2 ^ I 



with ^ij being real constants. 



(13) 



The factor ^ expresses the symmetry Fij = Fji and will be introduced in the 
weighting parameters . We turn the minimization of interfaces length into the 
minimization of boundaries length : 

K 

with 7 i being real constants. (14) 

i=l 



From (13) and (14) we obtain the constraint + which permits to select 

the weighting parameters 7 ^ in the problem of boundaries length minimization to 
retrieve the interfaces length minimization one. According to the Lemma exposed 
below, the minimization of (14) is operated by minimizing the functional (as 
a 0+) : 



K 



ff2 



S^{^,)\V<P,\dx. 



(15) 



Lemma : According to the previous definitions, let define 
La{<^i) = [ Sa{^i{x-,t))\V^i{x-,t)\dx, 

then we have 



10 



lim Tq(^j) = [ ds=|Fi|. 

Proof : using the Coarea formula [4], we have 
= 

By setting h{p) = f^p.^^ds we obtain 

La{<Pi) = [ Sa{p)h{p)dp = ^ [ (l + cos{—))h{p)dp 
Jr 2a \ a J 

If we take 0 = - we have 



5a{'^i{x-,t))ds\dp = / \6a{p) / ds\dp 



La{d>i) = ^ J (^1 + cos{Tr6)^h{a6)d9 
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Thus, when a — >■ 0 we obtain 



lim Lai^i) 

a — >-0 




1 + cos(7t6*) j d9 = h{0) = 




i^ii 



• Global functional ; 

The sum + F^ leads to the global functional : 



Fa {^ i ,...,^ K ) dx + y^^ji [ Sai<Pi)\'^^i\dx 

i=l ^ i=l ^ 

K 



(16) 



As a — >■ 0+, the solution set {d>i}i minimizing Fa{<Pi, ...,Fk), if it exists^ and 
according to (6), defines a classification compound of homogeneous classes (the 
so-called l?i phases) separated by regularized interfaces. 



3.3 Remark about length minimization 



Consider the length functional : 

where {C{p;t)}t is a set of closed parametrized {p G [0; 1]) curves over 17 such 
that G(0; t) = C(l; t) and ■ Then, L{t) is decreasing most rapidly 

if 



dC{p] t) 
dt 



kN 



(18) 



K being the local curvature of C{p; t) and N the inward normal. Curve evolution 
through PDE (18) is known as mean curvature motion (see [6] for instance). 
Active contours guided by (18) tends to regular curves in the sense that the 
length is minimized. PDE (18) can be written through a level set formulation 
[8] which is more convenient to manage curves breaking and merging. Assume 
that d : 17 X R+ — >■ R is a smooth continuous function such that, from the value 
of d{x;t), we can determine if x is interior, exterior or belongs to C{p;t). Let 
suppose that : C{p;t) = {x G fl/d{x',t) = a} (i.e. the contour is represented by 
level set a of function d). PDE (18) formulated by the use of level set becomes 



dd{x] t) 
dt 






(19) 



^ If they exist, minimizers {^i}i should be found in the space {<Pi : 17 x R"*" — > 
R/|V<?,| G L\0)} 
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with div{-^^) being the local curvature of level set a. Equation (19) was studied 
for instance in [5]. Evolution of level sets of function d (and so evolution of 
contour C{p; t) through level set a) from (19) is the level set formulation of mean 
curvature motion. The level set formulation allows breaking and merging fronts 
which is not possible from formulation (18). Since contour C(p; t) is represented 
by level set a, we only need to update PDE (19) in a narrow band around a. 
In this case, the level set formulation (19) comes from a reformulation of (18) 
to track the motion of contours C {p; t) . In our case, we directly define a length 
functional over contours Fi’s by the use of level set <?i’s. The associated 
Euler-Lagrange equations lead to K PDE’s of the form 



d^i{x]t) 

dt 



V 



( 20 ) 



Compared to PDE (19), we get from (20) a ’’natural” narrow band from the 
Dirac operator Sa whose width depends on the value of a (for <Pi’s defined as 
signed distance in (6)). 



4 Multiphase evolution scheme 

Using Neumann conditions (^(a;;t) = 0,Va; G df2), the Euler Lagrange equa- 
tions associated to give the K following coupled PDE’s {i = 1...K) 



d<P^ 






6j- 



(■Uo - Pif 




K 



= 0 , 



(21) 



with div denoting the divergence operator, and |^^*| ) being the (mean) 
curvature of level set <Pi at point x. We note that the term 5a{d>i) in (21) delimits 
a ’’natural” band in which the i**' PDE is non zero valued (for d>iS being signed 
distance functions) : = {x& Q /\d>i{x]t)\ < a}. 

We embed (21) into a dynamical scheme, and we get a system of K coupled 
equations (i = where dt is the step in time : 









{uo - 




+ \ 



K 



Z=1 



( 22 ) 



Let remark that we initially set the d>iS to signed distance functions which is 
commonly used for level set schemes. But as for (19), PDE’s (20) and (22) do 
not maintain the the constraint |V^i| = 1, and we regularly need to regularize 
the level sets <Pi to be sure they remain signed distance functions. This can be 
done for instance by the use of PDE defined in [12]. 



5 Experimental results 

We present some results for synthetic and real images. More experiments were 
conducted in [10], including noisy data. 
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Synthetic data presented on Fig. 2 are provided by the Research Group GDR 
ISIS. This image contains three classes of predefined parameters fii and (7^. We 
initialized three <Pi’s whose zero level sets (ZLS’s) are circular. We show the 
evolution of the ZLS’s, and the resulting classification is given with false colors. 
We choose a small value for the 7 i’s in order to retrieve the non smooth boundary 
of the class 3 object on the right handside of the data. Black pixels of false color 
image are pixels of vacuum (unclassified pixels). 

The image treated on Fig. 3 is a SPOT satellite image provided by the French 
Space Agency GNES. This image contains four classes whose parameters 
and (Ti were estimated in [1]. We show the evolution of the ZLS’s and the final 
classification (false color image). For these data, we use an automatic method 
for the initialization of the <Pi’s that we call ’’seed initialization”. This method 
consists of cutting the data image of Uq into N windows of predefined 

size. We compute the average to/ of uq on each window Wi. We then select the 
index k such that k = argmiiij(mi — And we initialize the corresponding 
circular signed distance function on each IF/. Windows are not overlapping 
and each of them is supporting one and only one function <Pk, therefore we avoid 
overlapping of initial <Pk’s- The size of the windows is related to the smallest 
details we expect to detect. The major advantages of this simple initialization 
method are : it is automatic (only the size of the windows has to be fixed), it 
accelerates the speed of convergence (the smaller the windows, the faster the 
convergence), and it is less sensitive to noise (in the sense that we compute the 
average to/ of uq over each window before selecting the function whose mean 
/Xfc is the closest one to mi). 



6 Conclusion 

We have presented a variational model based on level set formulation for image 
classification. The level set formulation is a way to represent regions and set 
of interfaces with a continuous function defined over the whole support of the 
image. The minimization of the functional leads to a set of coupled PDE’s which 
are considered through a dynamical scheme. Each PDE is guiding a level set 
function according to internal forces (length minimization), and external ones 
(data term, no vacuum and no region overlapping) . Results on both synthetic and 
satellite images are given. In [10] we proposed a way of introducing an additional 
restoration term in the model through the minimization of the functional: 



K . X f ^ 

Ga(u,<Pi,...,<pK) = ^Xi / Sa{<Pt)\^<Pi\dx + Y / ~ l^dx 



K 



+ ^6i / Ha{(!>i)— — ^d)_dx 



i=l 



in 



+A 



{Ru-Uo)‘^ + X2 / </3(|Vm|) 



'-Jn 



( 23 ) 





classification 



Fig. 2. ZLS evolution and classification for synthetic data containing three classes 
{fii = 100.0, /i 2 = 128.0 and ps = 160.0). Parameters are : A = 5.0, dt = 0.2, and for 
all i we have 7 i = 0.1 and a = 0.01. Final figure is the classification result with false 
colors. 
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SPOT data 




oooooooooooooooooooooooooooo 

R ooooooooooooooooooooooooooo 
ooooooooooooooooooooooooooo 

OOOOOOOOOOOOOOOOOOOCKOOOOOOOO 

000000000000000000000 * 000000 - 

oooooooooooooooooooooooooooo- 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo- 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo- 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo- 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo- 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 

oooooooooooooooooooooooooooo 



seed initialization 



iteration 50 




iteration 300 



classification 



Fig. 3. SPOT satellite image containing 4 classes with seed initialization (on windows 
of size 9x9) : We show three steps of the ZLS evolution. Final figure is the classification 
result with false colors. 
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with ip being a regularizing function and R being the impulse response of the 
physical system. We alternatively minimize with respect to u (restoration) 
and with respect to the f^^’s (classification). First results are promising, and 
we will study more precisely this model in future work. Further work will also 
be conducted to deal with the estimation of the class parameters (unsupervised 
classification) . We also envisage to extend this model to multispectral data (with 
applications to multiband satellite data and applications to color imaging). 
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Abstract. The behaviour of critical points of Gaussian scale-space im- 
ages is mainly described by their creation and annihilation. In existing 
literature these events are determined in so-called canonical coordinates. 
A description in a user-defined Cartesian coordinate system is stated, as 
well as the results of a straightforward implementation. The location of a 
catastrophe can be predicted with subpixel accuracy. An example of an 
annihilation is given. Also an upper bound is derived for the area where 
critical points can be created. Experimental data of an MR, a CT, and 
an artificial noise image satisfy this result. 



1 Introduction 

One way to understand the structure of an image is to embed it in a one- 
parameter family. In this way the image can be endowed with a topology. If a 
scale-parametrised Gaussian filter is applied, the parameter can be regarded as 
the “scale” or the “resolution” at which the image is observed. The resulting 
structure has become known as linear Gaussian scale-space. In view of ample 
literature on the subject we will henceforth assume familiarity with the basics 
of Gaussian scale-space theory [4,8,9,14,23,24,26,29,30]. 

In their original accounts both Koenderink as well as Witkin proposed to in- 
vestigate the “deep structure” of an image, i.e. structure at all levels of resolution 
simultaneously. Encouraged by the results in specific image analysis applications 
an increasing interest has recently emerged trying to establish a generic under- 
pinning of deep structure. Results from this could serve as a common basis for 
a diversity of multiresolution schemes. Such bottom-up approaches often rely 
on catastrophe theory [1,6,25,27,28], which is in the context of the scale-space 
paradigm now fairly well-established. 

The application of catastrophe theory in Gaussian scale space has been 
studied e.g. by Damon [3] — probably the most comprehensive account on the 
subject — as well as by others [7,10,11,12,13,15,16,17,18,19,20,21,22,23] 

Glosely related to the present article is the work by Florack and Kuijper [5], 
introducing new theoretical tools. We will summarise some results in section 2 
and give an experimental verification of the theory on both real and artificial 
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data sets in section 3. This verification includes visualisation of several theoreti- 
cal aspects applied on an MR, a CT, and an artificial noise image. Furthermore 
we show that the location in scale space of a catastrophe point can be predicted 
with subpixel accuracy. Of special interest are creations. We will show experi- 
mentally and theoretically that the regions in the image in which they can occur 
is typically small. 



2 Theory 



The behaviour of critical points as the (scale) parameter changes is described by 
catastrophe theory. As the parameter changes continuously, the critical points 
move along critical curves. If the determinant of the Hessian does not become 
zero, these critical points are called Morse critical points. In a typical image 
these points are extrema (minima and maxima) and saddles. The Morse lemma 
states that the neighbourhood of a Morse critical point can essentially be de- 
scribed by a second order polynomial. At isolated points on a critical curve the 
determinant of the Hessian may become zero. These points are called non-Morse 
points. Neighbourhoods of such points need a third or higher order polynomial, 
as described by Thom’s theorem. If an image is slightly perturbed, the Morse 
critical points may undergo a small displacement, but nothing happens to them 
qualitatively. A non-Morse point however will change. In general it will split into 
a number of Morse critical points. This event is called morsification. Thom’s the- 
orem provides a list of elementary catastrophes with canonical formulas^ for the 
catastrophe germs and the perturbations. The Thom splitting lemma states that 
there exist canonical coordinates in which these events can be described. These 
coordinates however do in general not coincide with the user-defined coordinates, 
but are used for notational convenience. In Gaussian scale space the only generic 
events are annihilations and creations of a pair of Morse points: an extremum 
and a saddle in the 2D case. All other events can be split into a combination 
of one of these events and one ‘in which nothing happens’. See Damon [3] for a 
proof. Canonical descriptions of these events are given by the following formulae: 



y; t) =*" -b Qxt ± (y^ -b 2t) 


(1) 


X, y; t) =‘" - 6x{y^ -b t) ± (y^ -b 2t). 


(2) 



Note that Eq. (1) and Eq. (2), describing annihilation and creation respectively, 
satisfy the diffusion equation 



du 



= Au . 



(3) 



Here A denotes the Laplacean operator. The reader can easily verify that the the 
form f^{x, y; t) corresponds to an annihilation via the critical path {^/—2t , 0 , t), 

^ Notation due to Gilmore [6] . Also the terminology normal forms is used in literature, 
e.g. by Poston and Steward [25]. 
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t < 0 at the origin, whereas corresponds to a creation at the origin 

via the critical path {\/+2t , 0 , t) t>0. 

In general the user-defined coordinates will not equal the canonical coordi- 
nates. Therefore it would be helpful to have a so-called covariant formalism, in 
which results are stated in an arbitrary coordinate system. Then the first order 
approximation of a non-Morse point is given by the linear system 



H w 




X 




g 


c 




t 




detH 



( 4 ) 



in which the coefficients are determined by the first order derivatives of the 
image’s gradient g and Hessian determinant det H, evaluated at the point of 
expansion near the critical point of interest (xq , to)> Eis follows: 



H = Vg,w = 5tg,z = VdetH,c = 9(detH. (5) 



See Florack and Kuijper [5] for more details. In 2D images, where x 
this becomes 



H = 



^xx I^xy 
^xy ^yy 

AL, 
ALy 



Lxxx Lyy Lxx Lxyy ^L^yLxxy 



LyyyLxX 



LyyLxxy 2Lxy Lxyy 



and 



C — Lxx^^yy 2Lxy^^xy 



Lyy ALxX J 



{x, y), 



(6) 

( 7 ) 

(8) 

(9) 



where Eq. (3) has been used. Apparently the first order scheme requires spatial 
derivatives up to fourth order. These derivatives are obtained at any scale by 
linear filtering: 



d'^+'^u{x,y;a) drf 
dx^ dy'^ ^ 



a™+”(/)(x'-a;,y'-y;cr) , , 

u{x,y) Qy,n dx dy 



where u{x, y) is the input image and 4>{x, y; a) a normalised Gaussian of scale a. 
It has been shown by Blom [2] that we can take derivatives up to fourth order 
without problems with respect to the results, provided scale is somewhat larger 
than pixelscale. It is important to note that Eqs. (4-9) hold in any Cartesian 
coordinate system. This property of form invariance is known as covariance. 

At Morse critical points we must restrict ourselves to Hx -|- wt = — g, i.e. 
the first row of Eq. 4. The solution is easily found to be 



X = 



( 10 ) 



If we define t = det Hr, Eq. (10) becomes x = — H'"''g — Hwt, where the 
matrix H is the transposed cofactor matrix, defined by HH = det H I. In 2D H 
reads 

Cyy Lxy 
~Cxy Lxx 




( 11 ) 
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Note that H exists even if H is singular. At critical points the scale-space velocity 
is defined by 

w — Hw . (12) 

Thus instead of tracing the two branches of the critical curve with Lindeberg’s 
drift velocity of critical points [22,23], v = — (if defined), it is now 
parametrised by a continuous function that is non-degenerate at the catastrophe 
point. Note that the scale-space velocity w has the direction of v at extrema, is 
opposite at saddles, and remains well-defined even if v does not exist. 

At non-Morse critical points the determinant of H becomes zero and we need 
to invert the complete linear system, Eq. (4). If we define 



M 



H w 

T 



the solution of Eq. (4) becomes 



(13) 



X 

t 



= 



g 

detH ’ 



(14) 



In general this inverse matrix exists even if the Hessian is singular. Florack 
and Kuijper [5] have proven that at annihilations det M < 0 and at creations 
det M > 0, where 

det M = c det H -I- z"'’w. (15) 

In a full (2-1- 1)D scale-space neighbourhood of a catastrophe the differential 
invariant det M reads 



det M = 

i\LxXyy -f Lyyyy^LxX \Lx 

\^Lxx\Lxxy -f djyyy\\Ly 

+ Lyy[L 

XXX “ 1 “ Lxyy\\LxxxLyy 



Lxxyy\Lyy ^^\^Lxxxy ~\~ Lxyyy\Lxy^^LxxLyy Lj.y'j -\~ 



-^yyLxxy ^LxyLxyy\ 
-'xx-^xyy ^-‘-'xy-‘-^xxy'\ H” 
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At catastrophes detH = 0, so Eq. (15) reduces to 

det M = z"'"w, (16) 

which is the innerproduct between the spatial derivative of detH, Eq. (5), and 
the scale-space velocity w, Eq. (12). It can be seen that only spatial derivatives 
up to third order are required at the catastrophe points. In the next section we 
will apply these results on several images. 

3 Experimental results 

In our experiments we used a 64 x 64 subimage of a 256 x 256 MR scan (Fig. la 
and b), CT scan (Fig. Ic and d), and a 64 x 64 artificial image with Gaussian 
noise of mean zero and standard deviation a = 10, also denoted as N(0,10) 
(Fig. le). 
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Fig. 1. a) Original 256 x 256 pixel MR image, b) 64 x 64 pixel subimage of a), c) 
original 256 x 256 pixel CT image, d) 64 x 64 pixel subimage of c). e) 64 x 64 artificial 
Gaussian N(0,10) noise image. 



3.1 Visualisation of and w 

As an example of the vectors w (see Eq. (12) ) and (see Eq. (5)) we selected 
two critical points of the MR image (Fig. lb) at scale a = 2.46. This image with 
its critical points is shown in Fig. 2a. Extrema (saddle points) are visualised by 
the white (black) dots. At the upper middle part of this image a critical isophote 
generated by a saddle and enclosing two extrema is shown (see also Fig. 2b). 
At a larger scale the saddle point will annihilate with the upper one of these 
extrema. At these two points we have calculated the direction and magnitude of 
the vectors w and . The vectors are shown on these points at two successive 
scales a = 2.46 (Fig. 2c) and cr = 2.83 (Fig. 2d). Indeed the velocity (given by w) 
of the extremum (dark arrow at the white dot) is in the direction of the saddle, 
and thus in the direction of the point of annihilation. The velocity vector at the 
saddle has the same direction, as the result of the parametrisation by Eq. (12)) 

Furthermore since the point where the annihilation takes place (at det H = 0) 
is between the two critical points, the vector which is the normalvector 
(recall Eq.(5)) to the zero-crossing of det H, directs from the saddle towards the 
extremum both at the saddle and the extremum. 

Finally it can be seen that the vectors of z^ and w at the critical points have 
an angle of more than ^ . Since det M is the innerproduct of these vectors at a 
catastrophe (see Eq. (16)), this leads to a negative sign of det M, indicating that 
the two critical points approach each other and disappear eventually. 



3.2 Location of the catastrophe 

Although the location of the critical points at the image can easily be calculated 
by using the zerocrossings of the derivatives, the subpixel position of the catas- 
trophe point in scale space requires invertsion of the complete linear system, 
Eq. (4), yielding Eq. (14). As an example we took the same two critical points 
as in the previous section. The resulting vectors of 4 successive scales for the 
MR subimage (Fig. 2c) are shown in Fig. 3. At each pixel the projection of the 
vector on the spatial plane is shown. A bright (dark) arrow denotes a positive 
(negative) scale-coordinate. The approximate location of the catastrophe can be 
found with subpixel precision by averaging the arrows as shown in Table 1. The 
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Fig. 2. a) Critical points (extrema white, saddles black) of Fig. lb at scale cr = 2.46. 
At the field of interest the critical isophote through a saddle is shown; b) subimage of 
a, showing the field of interest more clearly. The saddle is about to annihilate with the 
upper extremum; c) Subimage of the two annihilating critical points and the vectors 
of w (dark) and (bright) at scale cr = 2.46; d) Same, at scale cr = 2.83. 




Fig. 3. Visualisation of Eq. (14) of the vector {x,y); a bright (dark) arrow signifies 
a positive (negative) value of the t-component. The black dot is located at the mean 
value of the inner 15 arrows, the ellipse shows the standard deviation (see Table 1). 
First row: a: scale a — 2.34; b: scale a = 2.46; c: scale cr = 2.59; d: scale cr = 2.72. 
Second row: e: scale a = 2.86; f: scale cr = 3.00; g: scale a = 3.16, a catastrophe has 
occurred; h: scale cr = 3.32. 
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scale 


x-coordinate 


y-coordinate 


t-coordinate 


estimated scale 


2.34 

2.46 

2.59 

2.72 

2.86 

3.00 

3.16 

3.32 


0.481197 ± 1.27275 
0.869898 ± 0.401954 
0.893727 ± 0.340422 
0.92611 ± 0.286782 
0.95843 ± 0.250132 
0.991123 ± 0.26873 
1.02368 ± 0.380863 
1.05174 ± 0.603366 


-0.111758 ± 6.54458 
1.83346 ± 1.26546 
1.72886 ± 0.79602 
1.73222 ± 0.580028 
1.75858 ± 0.429409 
1.79548 ± 0.504445 
1.83618 ± 0.921176 
1.86346 ± 1.67306 


0.630053 ± 3.7568 
1.58998 ± 0.663776 
1.3391 ± 0.447622 
1.10524 ± 0.434127 
0.824525 ± 0.483923 
0.466264 ± 0.597825 
-0.00573945 ± 0.792309 
-0.642702 ± 1.1066 


2.59501 ± 1.4477 
3.03808 ± 0.218489 
3.06008 ± 0.146278 
3.09831 ± 0.140117 
3.13293 ± 0.154464 
3.15556 ± 0.189451 
3.15638 ± 0.251019 
3.12054 ± 0.354618 



Table 1. Estimation of the location of the catastrophe, as an average of the 15 arrows 
in the rectangle spanned by the two critical points of Fig. 3a. The origin in the (x, y)- 
plane is fixed for all figures at the middle of the saddle (black square) of Fig. 3a. The 
average value of the t-direction is positive below catastrophe scale and negative above it. 



black dot in Fig. 3 is located at the estimated position of the catastrophe, the 
ellipse shows the standard deviation of the estimation. 

Below the catastrophe-scale the location is accurate whereas at a scale above 
it (at cr = 3.32, see Fig. 3h) the estimated location turns out to be more un- 
certain. The estimation of the t-coordinate is positive below catastrophe-scale 
and negative above, as expected. The standard deviation is largely influenced 
by the cells that are distant from the critical curve, which also can be seen 
in Fig 3h. Since the relation between scale a and coordinate t is given by 
t = we can easily calculate the estimated scale aest = + “^tcaic with 

error dcTest — ^t^est * ^^calc — ^^calc/ ^est- 

By slightly increasing scales the catastrophe is experimentally found between 
the scales 3.050 and 3.053, which is covered by all estimated scales in Table 1. 
Since the estimation is a linear approximation of the top of a curve a small 
overestimation {here: a tenth of a pixel) is expected and indeed found in this 
case. In summary the location of the catastrophe point can be pinched down by 
linear estimation with subpixel precision. 



3.3 Fraction of the area where det M > 0 

Since creations can only occur at det M > 0, we calculated the number of pixels 
at the three different images (Figs, la, c and e) where this invariant is positive. 
If we for the moment assume that all elements of the matrix M are independent 
of each other, the distribution of catastrophes is in some sense random in the 
image, just as the distribution of extrema and saddle points, discriminated by 
the areas det H > 0 and det H < 0, respectively. However, since annihilations are 
supposed to occur more often and the derivatives up to third and fourth order 
are not independent since they have to satisfy the heat equation, we expect the 
area where det M > 0 to be small. In the following figures we show this fraction 
as a percentage of the total area of the image. 
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For the MR image we see a relative area of maximal 0.12 (Fig. 4, top-left). 
Furthermore the number of critical points decreases logarithmically with scale 
(Fig. 4, top-right). The slope is —1.76 ± .01. An a priori estimation value is -2, 
see e.g. Florack’s monograph [4]. 



MR image 



MR image 




0 20 40 60 80 

50 Log of scaie 
CT image 




20 40 60 80 

50 Log of scaie 
Noise image 



0.12 
: 0.1 
0.08 
0.06 
0.04 



V •/ 



20 40 60 80 

50 Log of scaie 



100 




20 40 60 80 

50 Log of scaie 
CT image 





20 40 60 80 

50 Log of scaie 



100 



Fig. 4. Results of calculations; scales vary from to First row: MR image; 

Second row: CT image; Third row: artificial noise image. First column: Fraction of 
detM > 0, ranging from 0.04 to 0.12 for the MR and CT image, and less for the 
artificial noise image; Second column: Logarithm of the number of critical points, 
with slopes — 1.76± .01, — 1.74± .02, and — 1.84± .01, respectively; 



In Fig. 5 the image of the sign of det M of the MR-subimage (Fig. lb) is 
shown at four successive scales. It appears that the locations of the image where 
det M is positive are relatively small isolated areas. 

For the CT image we see more or less the same results (Fig. 4, second row): 
the fraction where det M is positive is a bit higher at small scales (a < 2.22, 
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Fig. 5. In white the area where detlSA > 0 a: at scale a — 1.57, corresponding to the 
value 22.5 on the horizontal axis of Fig. 4 b: at scale a — 2.46, (value 45) c: at scale 
(7 = 3.866, (value 67.5) d: at scale cr = 6.05, (value 90) 



the value 40 at the horizontal axis) and a bit smaller at high scales. The slope 
of graph of the logarithm of the number of critical points at increasing scale is 
found to be —1.74 ± 0.02. 

At the noise image the relative area where det M > 0 is significantly smaller 
than at the MR and CT images. This might indicate that creations require some 
global structure (like a ridge), that is absent in a noise image. The logarithm of 
the number of extrema has a slope of —1.84 ± .01 (Fig. 4, bottom-right), which 
is closer to the expected value -2 than the slope at the MR and CT image. This 
might also be caused by the lack of structure in the noise image. 



3.4 Estimation of the area where det M > 0 



In the previous section the fraction of the area where det M > 0 was found to 
be ranging from 0.04 to 0.12. A mathematical survey on the sign of det M might 
show the expectation of creations. At non-Morse points this invariant can be 
simplified considerably. If the Hessian becomes singular, the rows (or, equiva- 
lently the columns) are dependent of each other, i.e. {L^x ■, Lxy) = \{Lxy , Lyy)- 
Therefore^ Lxx = ^^Lyy and Lxy = ^Lyy. So in general, the Hessian at a catas- 
trophe can be described by 



H = 



'A2 A' 




1 -A' 


A 1 


II 

U 


-A A\ 



^yy 



(17) 



The second order Taylor expansion of the image now reads Lyy x^+\ Lyy x y+ 
LyylX which reduces to \Lyy (Ax -I- y)^. The parameter A depends on the rota- 
tion between the axes of the real and the canonical coordinates. If these coin- 
cide we have A = 0, i.e. both Lxx and Lxy are zero, see Eqs.(l) and (2). With 
Eqs.(7-8), (12), (16-17) the explicit form of det M at a catastrophe in 2D reduces 
significantly to 



det IVE Lyyi^LxxX ‘^^LxxyL' 3 A Lxyy A Lyyy^i^ LxxxL~ ^Lxxy Lxyy~\~ '^-byyy) 

(18) 



^ The choice of Lyy as leading term is of minor importance, we could just as well have 
chosen y^i^Lxx , Lxy'} — }Lxy , Lyy}^ leading to Lyy — p Lxx and Lxy — y^Lxx^ which 
would be particularly prudent if Lyy is close to zero. 
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Equation (18) shows that the sign of det M only depends on third order deriva- 
tives and the orientation of the critical curve, as determined by A. If we assume 
that all third order derivatives are independent, the zerocrossings of equation 
(18) can be regarded as the union of two linear planes in the 4-dimensional 
{Lxxx, Lxxyi Lxyy, Lyyy) space. The planes divide this space into 4 subspaces 
where the determinant is either positive or negative, whereas any point on the 
planes leads to detM = 0. The normal vectors to these planes are given by 
Til = (1, — 3A, 3A^, — A^) and ri 2 = (—1, A, —1, A). The factor Lyy does not change 
the sign of the determinant. By definition we then have 

ni • U2 H-6A^-|-A^ 

cos(b=- — n — n — rr = , (19) 

llmll- 11^211 V(2 + 2A2)(1 + 9A2 + 9A4 + A6) 

This angle is invariant with respect to the transformations A — >■ — A and A — 
Fig. 6a shows the cosine of the angle for different values of A. 





Fig. 6. Left: Cosine of the angle of planes given by Eq. (19). Right: Fraction of the 
4D {Lxxx, Lxxy, Lxyy, I/yyy )-spac6 whecB det M is smaller than zero. 



Lemma 1. The fraction of the space (of third order derivatives) fj, where cre- 
ations can occur is bounded by ^ arccos(|-\/5) < M ^ i- 



Proof. The fraction of the space where annihilations can occur is given by the 
fraction of the image where det M < 0 and det H = 0. Since Eq. (19) is negative 
definite and </> G [0 , tt] , the fraction ^ gives the fraction of the space where 
annihilations can occur. This fraction varies from | at both A = 0 and A — >■ oo, to 
i arccos(— 1-\/5) ~ 0.852 ... at A = 1, which follow directly from differentiation, 
see also Fig. 6b. Equivalently, creations can occur in at most \ of all possible 



tuples {Txxxt Pxxyj P. 



xyyi Lyyy)- 



The usual generic events, e.g. discussed by Damon [3] and others [10] correspond 
to the case A = 0. In the canonical coordinates the equations (1) and (2) are 
found. Then Eq. (18) reduces to detM = —LyyLxxx {Lxxx + Lxyy) and it can 
easily be seen that the fraction of the space is |, i.e. in only | of the possible 
values of Lxxx and Lxyy a creation can occur. 






328 A. Kuijper and L. Florack 



4 Conclusion and Discussion 

We have used an operational scheme to characterise critical points in scale-space. 
The characteristic local property of a critical point is determined by its Hessian 
signature (saddle or extremum) . Pairs of critical points with opposite signature 
can be annihilated or created. Close to such catastrophes, empirically observed 
properties of the critical points are consistent with the presented theory. The 
location of catastrophes in scale space can be found with subpixel accuracy. 
The approximate location of an annihilation and the idea of scale space velocity 
have been visualised. In general, more annihilations than creations are observed, 
probably because creations need a special structure of the neighbourhood. This 
is also indicated by the results of the noise image. We have shown that the area 
where creations can occur, determined by the third order derivatives, is at most 
i. In our experiments this fraction is even smaller than approximately 0.125 .... 
It remains to be investigated whether the relative volumes of det M > 0 and 
det M < 0 is indicative of a similar ratio between creations and annihilations. In 
future work we will therefore examine the correlation between the distributions 
of the various derivatives in the definition of det M. Blom [2] has given a general 
framework, which might give a more precise explanation of the small number of 
creations. 
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Abstract. Since the work by Osher and Sethian on level-sets algorithms 
for numerical shape evolutions, this technique has been used for a large 
number of applications in numerous fields. In medical imaging, this nu- 
merical technique has been successfully used for example in segmentation 
and cortex unfolding algorithms. The migration from a Lagrangian im- 
plementation to an Eulerian one via implicit representations or level-sets 
brought some of the main advantages of the technique, mainly, topology 
independence and stability. This migration means also that the evolution 
is parametrization free, and therefore we do not know exactly how each 
part of the shape is deforming, and the point- wise correspondence is lost. 
In this note we present a technique to numerically track regions on sur- 
faces that are being deformed using the level-sets method. The basic idea 
is to represent the region of interest as the intersection of two implicit 
surfaces, and then track its deformation from the deformation of these 
surfaces. This technique then solves one of the main shortcomings of the 
very useful level-sets approach. Applications include lesion localization 
in medical images, region tracking in functional MRI visualization, and 
geometric surface mapping. 

Key words: Level-sets, region tracking and correspondence, medical imag- 
ing, segmentation, visualization, shape deformation. 



1 Introduction 

The use of level-sets for the numerical implementations of n-dimensional^ shape 
deformations became extremely popular following the seminal work of Osher and 
Sethian [17] (see for example [14,18] for some of the applications of this tech- 
nique and a long list of references) . In medical imaging, the technique has been 
successfully used for example for 2D and 3D segmentation [5,10,12,15,20,21]. 

* A journal version of this paper appears in the May 1999 issue of IEEE Trans. Medical 
Imaging. 

^ In this note we consider n > 3. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 330—338, 1999. 
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The basic idea is to represent the deformation of an n-dimensional closed sur- 
face S as the deformation of an n -I- 1-dimensional function <1>. The surface is 
represented in an implicit form in 'P, for example, via its zero level-set. For- 
mally, let’s represent the initial surface 5(0) as the zero level-set of P, i.e., 
5(0) = {X G fR" : P(X, 0) = 0}. If the surface is deforming according to 



dS(t) 

dt 



P^fs, 



( 1 ) 



where Afs is the unit normal to the surface, then this deformation is represented 
as the zero level-set of P{X,t) : IR^ x [0,r) — >■ M deforming according to 



dP{X,t) 

m 



P{X,t) II VP{X,t) II, 



(2) 



where P{X,t) is computed on the level-sets of P{X,t). The formal analysis of 
this algorithm can be found for example in [6,7]. 

The basic idea behind this technique is that we migrate from a Lagrangian 
implementation (particles on the surface) to an Eulerian one, i.e., a fix Cartesian 
coordinate system. This allows for example to automatically follow changes in 
the topology of the deforming surface 5, since the topology of the function P is 
fixed. See the mentioned references for more details on the level-sets technique. 

In a number of applications, it is important not just to know how the whole 
surface deforms, but also how some of its regions do. Since the parametrization 
is missing, this is not possible in a straightforward level-sets approach. This 
problem is related to the aperture problem in optical flow computation, and it is 
also the reason why the level-sets approach can only deal with parametrization 
independent flows that do not contain tangential velocities. Although tangential 
velocities do not affect the geometry of the deforming shape, they do affect the 
‘point correspondence’ in the deformation. For example, with a straight level- 
sets approach, it is not possible to determine where a given point Xq G 5(0) 
is at certain time t. One way to solve this problem is to track isolated points 
with a set of ODE’s, and this was done for example in grid generation and 
surface flattening; see [9,18]. This is a possible solution if we are just interested 
in tracking a number of isolated points. If we want to track regions for example, 
then using ‘particles’ brings us back to a ‘Lagrangian formulation’ and some of 
the problems that actually motivated the level-sets approach. For example, what 
happens if the region splits during the deformation? What happens if the region 
of interest is represented by particles that start to come too close together in 
some parts of the region and too far from each other in others? 

In this note we propose an alternative solution to the problem of region 
tracking on surface deformations implemented via level-sets.^ The basic idea is 
to represent the boundary of the region of interest 77. G 5 as the intersection 

^ A different level-set approach for intrinsic motions of generic 3D curves, together 
with very deep and elegant theoretical results, is introduced in [1]. This approach 
is difficult to implement numerically, and in some cases not fully appropriate for 
numerical 3D curve evolution [16]. A variation of this technique, with very good 
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of the given surface S and an auxiliary surface S, both of them given as zero 
level-sets of n -I- 1-dimensional functions and <P respectively.^ The tracking of 
the region TZ is given by tracking the intersection of these two surfaces, that is, 
by the intersection of the level-sets of <P and In the rest of this note we give 
details on the technique and present examples. 

Note that although we use the proposed technique to track regions of interest 
on deforming surfaces, with the region deformation dictated by the surface defor- 
mation, the same general approach here presented of simultaneously deforming 
n hypersurfaces (n > 2) and looking at the intersection of their level-sets can 
be used for the numerical implementation of generic geometric deformations of 
curves and surfaces of high co-dimension.^ 



2 The algorithm 

Assume the deformation of the surface S, given by (1), is implemented using 
the level-sets algorithm, i.e.. Equation (2). Let TZ & S he & region we want 
to track during this deformation, and dTZ its boundary. Define a new function 
<?(X, 0) : IR" — >■ K (a distance function for example), such that the intersection 
of its zero level-set S with S defines dTZ and then TZ. In other words, 

a7^(0) := 5(0) n 5(0) = {X e K” : <?(X, 0) = ^(X, 0) = 0}. 

The tracking of TZ is done by simultaneously deforming <1> and The auxiliary 
function <1> deforms according to 

^^^=/3(X,t)||V<?(X,t)||, (3) 

and then 5 deforms according to 



(4) 

We have then to find the velocity /? as a function of f3. In order to track the 
region of interest, dTZ must have exactly the same geometric velocity both in 
(2) and (3). The velocity in (2) (or (1)) is given by the problem in hand, and is 

experimental results, is introduced in [11]. The Ambrosio-Soner approach and its 
variations deal with intrinsic curve- velocities and do not address the surface- velocity 
projection needed for the tracking in this paper. 

® The use of multiple level-set functions was used in the past for problems like motion 
of junctions [13]. Both the problem and its solution are different from the ones in 
this paper. 

^ After this paper was accepted for publication, we became aware of recent work by 
Osher and colleagues using this general approach mainly to deform curves in 3D and 
curves on surfaces [4]. This work also does not deal with the projection of velocities 
as needed for our application. 
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PMs- Therefore, the velocity in (4) will be the projection of this velocity into 
the normal direction Mg (recall that the tangential component of the velocity 
does not affect the geometry of the flow). That is, for (at least) dlZ, 

p = PArs-Mg. 



Outside of the region corresponding to TZ, the velocity (3 can be any function 
that connects smoothly with the values in dTZ.^ 

This technique, for the moment, requires to find the intersection of the zero- 
level sets of and at every time step, in order to compute (3. To avoid this, 
we choose a particular extension of j3 outside of dlZ, and simple define (3 as the 
projection of f3J\f s for all the values of X in the domain of and Therefore, 
the auxiliary level-sets flow is given by 



dS 

iH 



(X,t) 



V<^(X,t) 

’ ^1 V<?(X,t) II 

V<?(X,t) II, 



V<^(X,t) \ 

II v<?(x,t) \\j 



and the region of interest 'lZ{t) is given by the portion of the zero level-sets that 
belongs to ^(X, t) fl <?(X, t ) : 

&R{t) = {X G K” : ^(X, t) = S{X, t) = 0}. (5) 



For a number of velocities [3, short term existence of the solutions to the level- 
sets flow for <P (in the viscosity framework) can be obtained from the results of 
Evans and Spruck [8]. 

This formulation gives the basic region tracking algorithm. In the next sec- 
tion, we present some examples. 



3 Examples and comments 

We now present examples of the proposed technique. We should note that: (a) 
The numerical implementation of both the flows for (p and ^ follow the ordinary 
level-sets implementations developed by Osher and Sethian [17]; (b) Recently in- 
troduced fast techniques like narrow bands, fast-marching [18], or local methods 
[14], can also be used with the technique here proposed to evolve each one of the 
surfaces; (c) In the examples below, we compute a zero-order type of intersection 
between the implicit surfaces, meaning that we consider part of the intersection 
the full vortex where both surfaces go through (giving a jagged boundary). More 

® To avoid the creation of spurious intersections during the deformation of ‘P and 
P, these functions can be re-initialized every few steps, as frequently done in the 
level-sets approach. 

® Note that although S and S do not occupy the same regions in the n dimensional 
space, their corresponding embedding functions P and P do have the same domain, 
making this velocity extension straightforward. 
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accurate intersections can be easily computed using sub-divisions as in march- 
ing cubes. Recapping, the same numerical implementations used for the classical 
level-sets approaches are used to implement the deformation of and finding 
the intersection is straightforward from algorithms like marching cubes. 

Four examples are given in Figure 1 and Figure 2, one per column. In each 
example, the first figure on the top shows the original surface with the marked 
regions to be tracked (brighter regions), followed by three different time steps of 
the geometric deformation and region tracking. 

Figure 1 shows two toy examples. We track the painted regions on the sur- 
faces while they are deforming with a morphing type velocity [2,3]. (/3(X,t) is 
simply the difference between the current surface <^(X, t) and a desired goal sur- 
face ^(X, oo), two separate surfaces and two merged balls respectively, thereby 
morphing the initial surface toward the desired one [3].) Note how the region 
of interest changes topology (splits on the left example and merges on the next 
one). 

Next, Figure 2 presents one of the main applications of this technique. Both 
these examples first show, on the top, a portion of the human cortex (white- 
matter/gray-matter boundary), obtained from MRI and segmented with the 
technique described in [19]. In order to visualize brain activities recorder via 
functional MRI in one of the non- visible folds (sulci), it is necessary to ‘un- 
fold’ the surface, while tracking the color-coded regions (surface unfolding or 
flattening has a number of applications in 3D medical imaging beyond fMRI 
visualization; see also [9]). In the first of these two examples (left column), the 
different gray values simply indicate sign of Gaussian curvature on the original 
surface (roughly indicating the sulci), while two arbitrary regions are marked in 
the last example (one of them with a big portion hidden inside the fold). We 
track each one of the colored regions with the technique described in this note. 
In the first column, /3(X,t) = sign(Ki)+sign(K 2 ) ^in([^i|, |^ 2 |), where ki and K 2 
are the principal curvatures. In the second column, we use a morphing type ve- 
locity like before [2,3] (in this case, the desired destination shape is a convex 
surface). See [9] for additional possible unfolding velocities, including volume 
and area preserving ones. The colors on the deforming surfaces then indicate, 
respectively, the sign of the Gaussian curvature and the two marked regions in 
the original surfaces. Note how the surface is unfolded, hidden regions are made 
visible, and the tracking of the colored coded regions allow to And the matching 
places in the original 3D surface representing the cortex. This also allows for 
example to quantify, per each single tracked region, possible area/length dis- 
tortions introduced by the flattening process. In order to track all the marked 
regions simultaneously in these two examples, we select the zero level-set of ^ 
to intersect the zero level-set of at all these regions. If we have regions with 
more than two color codes to track, as will frequently happen in fMRI, we just 
use one auxiliary function ^ per color (region). 

The same technique can be applied to visualize lesions that occur on the 
‘hidden’ parts of the cortex. After unfolding, the regions become visible, and the 
region tracking allows to And their position in the original surface. When using 
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Fig. 1. Two simple examples, one per column, of the algorithm introduced in this note 
(brighter regions are the ones being tracked), demonstrating possible topological changes 
on the tracked region. 
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Fig. 2. Unfolding the cortex, and tracking the marked regions, with a curvature based 
flow and a 3D morphing one, left and right columns respectively. 
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level-sets techniques to deform two given shapes, one toward the other (a 3D 
cortex to a canonical cortex for example), this technique can be used to find 
the region-to-region correspondence. This technique then solves one of the basic 
shortcomings of the very useful level-sets approach. 
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Abstract. We explain how a discrete grey level image can be numer- 
ically translated into a completely pixel independent geometric struc- 
ture made of oriented curves with grey levels attached to them. For 
that purpose, we prove that the Affine Morphological Scale Space of 
an image can be geometrically computed using a level set decomposi- 
tion/reconstruction and a well adapted curve evolution scheme. Such an 
algorithm appears to be much more accurate than classical pixel-based 
ones, and allows continuous deformations of the original image. 



1 Introduction 

If a mathematician had to examine recent evolutions of image analysis, he would 
certainly notice a growing interest for geometric techniques, relying on the com- 
putation of differential operators like orientation, curvature, ... or on the analysis 
of more global objects like level curves. Of course Fourier or wavelet analysis are 
still very efficient for image compression for example, but in order to analyze 
larger scales geometric approaches seem to be more relevant. At large scales 
a real-world image can hardly be considered -like a sound signal- as a super- 
imposition of waves (or wavelets), since the main formation process relies on 
occlusion, which is highly nonlinear. This is not without some mathematical 
consequences : in this context, images are more likely to be represented in a 
geometrical space like By(]R^), the space of functions on with bounded vari- 
ation, than in the more classical L^(R.^) space. From a practical point of view, 
the question of the numerical geometric representation of an image certainly 
deserves to be investigated, since images have been described so far by arrays of 
numbers (or wavelet/DCT coefficients for compressed images). It is likely that 
in the future alternative geometric descriptions will be commonly used, relying 
on some level-set/texture decomposition like the one proposed in [8]. 

In this paper, we show how it is possible to compute numerically a completely 
geometric and multiscale representation of an image, for which the notion of pixel 
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disappears (though it can be recovered). Our algorithm is a fully geometrical im- 
plementation of the so-called Affine Morphological Scale Space (AMSS, see [1]), 
described in Sect. 2. Due to its contrast invariance, this scale space is equivalent 
to the affine curve shortening process described in [15], for which a fully geomet- 
rical algorithm has been recently proposed in [11]. A simplified version of this 
scheme, described in Sect. 3, allows to process all level curves of an image with a 
high precision in a couple of minutes. In association with level set decomposition 
and reconstruction algorithms, described in Sect. 4, we thus compute the AMSS 
of an image with much more accuracy than classical scalar schemes, as is shown 
in Sect. 5. Another interest of this method is that it yields a contrast-invariant 
multiscale geometric representation of the image that provides a framework for 
geometry based analyses and processing. We illustrate this in Sect. 6 by applying 
our algorithm to image deformation. 



2 The AfRne Morphological Scale Space 

A natural way of extracting the geometry of an image consists in the level set de- 
composition inherited from Mathematical Morphology. Given an image u viewed 
as an intensity map from to K, one can define the (upper) level sets of u by 

Xa(m) = {x G ■u(x) ^ A}. 

This collection of planar sets is equivalent to the function u itself since one has 
the reconstruction formula 



u(x) = sup{A; X G xa}- 

The main interest of this representation is its invariance under contrast changes : 
if g is an increasing map from R. to R (i.e. a contrast change), then one has 

Xg(\){g{u)) = X\{u)- 

Hence, the collection of all level sets of an image does not depend a priori on 
the global illumination conditions of this image, and is thus an interesting geo- 
metrical representation. 

Now, because an image generally contains details of different sizes, the notion 
of scale-space has been introduced. It consists in representing an original image 
■*^o(’) by a collection of images which are simplified versions of uq 

such that m(-,0) = ■uo(-) and, with increasing scale t, the u(.,t)’s represent 
more and more coarser versions of uq . There are plenty of possibilities for such 
representations, but it is possible to reduce them by demanding strong invariance 
properties from the operator Tf which transforms uo(-) into In particular, 

it is possible to enforce the level set decomposition evoked above to be compatible 
with the scale-space representation, in the sense that the A-level set of u{-, t) only 
depends on the A-level set of uq- If one asks, in addition, for other properties 
like regularity, semi-group structure, and Euclidean Invariance (i.e. translation 
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and rotation invariance), then according to [1] one reduces the possibilities to 
the choice of a nondecreasing continuous function F governing the scale space 
given by 



du 



\Du\F 




( 1 ) 



where Du represents the spatial gradient of u. In this paper we have chosen the 
particular case of the Affine Morphological Scale Space, given by F(s) = , 

for mainly two reasons : 

First it yields an interesting additional invariance property called Affine In- 
variance : 



Tt{uo o (f>) = (TtUo) o (f> for any (f>{x)=Ax + b, AgSL{R^), & G (2) 

This property allows to perform affine-invariant shape recognition under occlu- 
sions (see [6]) and to compute local affine-invariant features like affine curvature 
for example. 

Second, there exists a fully consistent geometric scheme (see [11]) for solving 
the level curve evolution induced by the AMSS, 

f = (3) 

Here C is any point of a level curve, k the local curvature and N the normal 
vector at this point. In particular, this scheme guarantees that the inclusion of 
any two sets is preserved by the evolution (inclusion principle) . In the simplified 
version described in Sect. 3, it is fast (linear complexity) and robust, as only 
areas and middle points are computed. 



3 A Fast Geometric Scheme 

The numerical implementation of the affine scale space of a curve given by (3) can 
be realized in several ways. For our purpose, an ideal algorithm should satisfy, 
up to a given computer precision, the following properties : 

PI: preserve inclusion, which is necessary for level set reconstruction; 

P2: be affine invariant, since the scale-space is; 

P3: have linear complexity, so that all level curves of an image can be processed 
with a high precision in a reasonable time. 

Of course, algorithms based on scalar formulations (see [16]) are not relevant 
here, since our goal is precisely to get rid of pixel based representations. In any 
case, such algorithms satisfy neither PI nor P2, and are not computationally 
efficient (in terms of time and memory) if an accurate precision is needed (e.g. 
100 points per original pixel). The purpose of this paper is to present a scheme 
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Fig. 1. (T-affine erosion ( ) of a convex non closed curve ( — ). 



opposite to Sethian’s formulation^, since we want to solve a scalar (contrast- 
invariant) evolution equation with a geometric algorithm. We cannot either use 
local point evolution schemes (naive ones or more refined ones like in [10]) since 
they do not guarantee PI (because they rely on a local estimation of the cur- 
vature that is not directly connected with a global property like inclusion), and 
for the same reason P2 would be uncertain (and might depend on the discretiza- 
tion). This is the reason why we started from the geometric scheme described in 
[11], for it satisfies PI and P2. This scheme is based on a simple operator called 
affine erosion, which we define now. 

Consider a (non necessarily closed) convex parameterized curve C : [a, b] — >■ 
K.^ and an area parameter a. Define a a-chord set of C as the region with area cr 
that is enclosed by a segment [C'(si)C'(s 2 )] and the corresponding piece of curve 
C([si) S 2 ])- Then, the u-affine erosion of C, written E„{C), can be defined as the 
segment-side boundary of the union of these cr-chord sets (see Fig. 1). Without 
going into details (which might be found in [11] and [12]), we briefly recall two 
main results about the affine erosion : 

First, the evoked boundary is essentially obtained by the middle points of 
the segments [C(si)C(s 2 )] defining the a-chord sets (minus some “ghosts parts” 
that do not appear in general, and plus two end-segments if the curve is not 
closed) 

Second, the infinitesimal iteration of such an operator asymptotically yields 
the affine scale space of the initial curve Cq (with fixed endpoints if Co is not 
closed), one has 



(ifcr)” (Co) — >■ C(-, t)) as n — >■ - 1 - 00 , CT — >■ 0 and 



2/3 



— >■ t , 



where C{-,t) is defined from Co by (3). 

The scheme presented in [1 1] relies on a more general definition of the affine 
erosion that also applies to non-convex curves. Compared to the convex case, 
the computations are more complex since the computation of curve intersections 
may be needed, which requires careful programming and causes the complexity 

^ The main interest of scalar formulations is that they naturally handle topological 
changes of the level sets : it has however been proved in [2] that no such changes 
occur for the affine scale space. 
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of the algorithm to become quadratic in the number of vertices of the polygonal 
curve to be processed. This is the reason why, as suggested in [11], we have 
chosen an alternative scheme based on the separate treatment of each convex 
part. Given a possibly non-convex and non-necessarily closed polygonal curve, 
we iterate the following three-step process : 

1. Break the curve into convex components (by “cutting” inflection segments 
at their middle point), and compute the minimum value Umm of the area of 
any non-closed component. 

2. Define areal = min((Tmm, a) and apply a discrete affine erosion of area areal 
to each convex component. 

3. Concatenate the obtained pieces of curves in order to obtain a new (possibly 
non-convex) curve. 

This approach yields a good approximation of the exact affine-erosion of C. The 
main reason is that near an inffection point of C the curve is locally flat, thus, for 
reasonable values of cr , the tangents at the inflection point do not evolve much 
during step 2. This ensures that the convex pieces nicely fit together at step 3 
and that the whole three-step process described above is not very sensitive to the 
precise location at which the curve is cut. These are unformal observations rather 
than theoretical statements, indeed for reasonable values of cr both evolutions 
give the same result (see [12]). The main advantage of this simplified algorithm 
is that it is fast (it has linear complexity) and it is robust, since each evolution is 
obtained by middle points of segments whose endpoints lie on the original curve 
and whose selection only relies on an area computation. 



4 The Complete Algorithm 

Our geometric multiscale representation algorithm for numerical images is made 
out of the following 5 steps. 

Step 1: decomposition. 

The level set extraction is done currently in a straightforward way. Using 4- 
connectedness, we extract, for each grey value which appears in the image, the 
corresponding level sets (upper or lower) and keep the oriented border such that 
the level set lies on the left of the curve. We thus obtain a set of curves which 
are either closed or start and end on the image border. For each curve we keep 
the associated grey level, so that we have a representation that is completely 
equivalent to the initial image. It is important to notice that no interpolation is 
made to extract these level curves : the pixels are simply considered as adjacent 
squares. 

Step 2: symmetrization. 

Then, in order to get Neumann boundary conditions for the affine scale space, 
we have the possibility to symmetrize the level lines which start and end on the 
border, which guarantees that the curve will remain orthogonal to the image bor- 
der during the evolution. Curves with end points on the same side are reflected 
once, curves with end points on adjacent sides are reflected twice, thus yielding 
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closed curves. Finally curves with end points on opposite sides are reflected once 
at each side (they should, theoretically, be reflected inflnitely many times, but 
in practice once is enough). Without this symmetrization the endpoints of the 
non-closed level curves would remain fixed. 

Step 3 : AMSS. 

At this stage, we process all level curves with the geometric implementation 
of the affine scale space described in Sect. 3. This involves two parameters : 
the scale of evolution t and the precision e at which curves are computed. We 
normalize e such that 1/e corresponds to the number of points that will be used 
to describe a one-pixel-length curve. 

Step 4: geometric transformation and/or computation. 

Once achieved steps 1 to 3, we have a smooth geometric description of the image 
that allows to perform any geometric transformation and/or computation. We 
shall give an example of image deformation in section 6, but there are many other 
possibilities of geometric processing. For example, one can simply remove level 
sets that are too small, or too oscillatory (see [13]), or satisfying any geometric 
criterion that can be estimated on a smooth curve. 

Step 5: rasterization and reconstruction. 

After transforming the level lines of the initial image, we need to reconstruct the 
corresponding image. This is done by filling in the level sets and using the grey 
level information associated with each level line. This step is more complex than 
the extraction of level lines, since now the level curves are made of points with 
non-integer coordinates, and thus we have to decide whether a pixel is inside 
the set or not. We first rasterize the curves using an adaptation of Bresenham’s 
algorithm^ that satisfies the inclusion principle. In this algorithm a pixel belongs 
to the side of the curve where more than half of its area is (see Fig. 2) and a 
special treatment for very near points (sub-pixel) is added. 




Fig. 2. Example of rasterization of a floating-point curve ( — ) into a pixel separating 
integer curve ( ). 



^ Bresenham’s algorithm is a well known algorithm from computer graphics which 
allows to join two points which have floating-point coordinates and are distant by 
more than a couple of pixels (see [3]). 
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Our implementation allows a good approximation of the real curves : only 
some very small, sub-pixel, details of the eroded curve might be lost during the 
rasterization, such as curves enclosing an area smaller than one pixel. 



Complexity of the Algorithm 

The complexity of the different steps of the algorithm depends on the following 
parameters : N, the number of pixels contained in the original image (typically 
10®) ; G, the number of grey-levels contained in the original image (typically 
256) ; £, the precision (number of points per pixel) at which the level curves need 
to be computed (typically from 1/100 to 1/2) ; t, the scale at which the level 
curves are smoothed, and N' , the number of pixels contained in the reconstructed 
image. Table 1 gives upper bounds of time and memory complexity for the steps 
described above. 





computation time 


memory 


decomposition 


N xG 


N xG 


affine scale space 


N X G X t/e 


N X G/e 


rasterization 


N X G/e 


N X G/e 


reconstruction 


N' xG 


N' xG 



Table 1. Complexity of the proposed algorithm 



Notice that the upper bound of iV x G points to describe all level lines of G is 
very rough. For the classical Lena image (see Fig 5, one has N = 256^, G = 238, 
and the decomposition yields in 10 seconds about 48000 curves and 1.1 million 
points. Then the affine scale space computation for e = 1/2 takes 2.5 minutes 
and yields 13000 curves and 0.78 million points. The final rasterization and 
reconstruction takes 10 seconds. 

5 Comparison with Scalar Schemes 

The purpose of this section is to compare the geometric algorithm we proposed 
with explicit scalar schemes based on the iteration of a process like 

+ |D„"| 

where Hu" and div are computed either by using finite differences on a 3x3 
neighborhood of the current pixel (see [7]) or by using a non local estimation 
of the image derivatives, as obtained by a Gaussian convolution (see [14]). Such 
a scheme is strongly limited by the grid : the localization of the level curves is 
known up to a precision of the order of the pixel size (even when interpolation 
is used), and affine-invariance could only be achieved at large scales (but even 
rotation-invariance is difficult to ensure at all scales, as noticed in [14]). Another 
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striking side effect of such scalar schemes is that they need to produce artificial 
diffusion in the gradient direction. In other terms, a scalar scheme cannot be 
contrast invariant, and in practice new grey levels (and consequently new level 
curves) are created. The reason is the following : for a purely contrast-invariant 
algorithm defined on a fixed grid, a point of a level curve either does not move 
or moves by at least one pixel. This constraint causes small (i.e. large scale) 
curvatures to be treated as zero, and is for that reason incompatible with a 
curvature-driven evolution like AMSS. 

These effects are illustrated on Fig. 3. We have chosen an image of size 100x70 
which has been magnified 10 times and subsampled to only 10 20-spaced grey 
levels (see the first row). The left column presents the image, the right column 
the level lines. In the second row we show the result of our algorithm : no level 
sets (i.e. grey levels) are created and the level lines are smoothed. The last 
row presents the result of a classical scalar algorithm. As expected, this scheme 
produces artificial scalar diffusion and creates new grey levels, thus causes a 
multiplication of the level lines. This can be seen in the left half of the level 
lines image where 5-spaced level lines are represented; in the right part, 20- 
spaced level lines are represented which should be the only present. One can 
also remark some anisotropy in that side effect diffusion : it is more attenuated 
along the directions aligned with the grid (i.e. horizontal or vertical directions), 
which enforces the visual perception of this grid. 



6 Applications 

6.1 Visualization of Level Curves 

The level sets of an image are generally so irregular that only a few of them 
can be visualized at the same time. Extracting and processing independently 
each level curve of an image produces an interesting tool to visualize clearly 
the level lines of a given image, as illustrated in Fig. 5. In that image we can 
see all 4-spaced level lines of the Lena image thanks to the smooth geomet- 
ric representation provided by the geometric Affine Morphological Scale Space. 
Such a superimposition shows interesting shape information about the geometric 
structure of the image. 



6.2 Image Deformation 

In this part, we show how our algorithm can be used to apply a geometric trans- 
form to an image. In the experiments that follow, projective or affine transforms 
are used, but more complex geometric transform will work as well. Let u(i,j) be 
a given, discrete image, how can one define an approximation (or interpolation) 
u oi u that allows to build the transformed image 

/ ai + bj + c fi + gj + h \ 

\di + ej + I’ di + ej + I ) 



v{i,j) = u 
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where a, b, c, d, e, f, g, h are given coefficients (that may vary) ? One can dis- 
tinguish between two kinds of methods. The first are continuous (and explicit) 
methods, for which a continuous model for u is explicitly given and computed 
from u once and for all. The second are discrete (and implicit) methods, for 
which u is implicitly defined and must be estimated for each discrete grid. 

For example, zero order interpolation defined by u{i,j) = u{[i + 1/2], [j + 
1/2]), where [x] represent the integer part of x, or bilinear interpolation and 
higher order generalizations are explicit methods. On the opposite, image in- 
terpolation/approximation algorithms based on the minimization of a certain 
error between discrete images (like the Total Variation used in [9]) are implicit 
methods. In fact, this distinction is rather formal if the practical criterion is not 
“how is u defined ?”, but “how much time does it take to compute u ?”. For 
example, Fourier representation is an explicit method, but for non-Euclidean 
transformations it is computationally expensive. Indeed if N is the number of 
pixels of the original image u, it requires N operations to compute the value 
of ft at a given point. From that point of view, our representation is a compro- 
mise between computation time (once the level lines have been extracted and 
smoothed, the deformation and the reconstruction processes are fast) and accu- 
racy (the geometry of the level sets is precisely known). We do not affirm that 
the Affine Morphological Scale Space yields the best image approximation : it 
is geometrically better than bilinear interpolation (for which pixelization effects 
remain), but less accurate than sophisticated image interpolation algorithms like 
[9] . However, we proved that it can be precisely computed in a reasonable time 
and then allowing any kind of geometric deformation. 

We compared deformations yielded by our method, zero and bilinear inter- 
polation on two images : 

On a simple binary image (left in Fig. 4), we applied an affine deformation 
using three different methods : 1. a bilinear interpolation (left part of middle 
image); 2. a zero-order interpolation (right part of middle image); 3. the ge- 
ometric representation described in this paper (right hand image). In order to 
gain space we have put zero-order and bilinear interpolation in the same picture. 
Contrary to classical methods, a geometric curve shortening quickly provides a 
good compromise between pixelization effects, accuracy and diffusion effects. 

In Fig. 6 we present a satellite image from which we have simulated a pro- 
jective view (from right to left as indicated by the black trapezoid). Fig. 7 left 
shows the results with zero-order interpolation (left part) and bilinear interpo- 
lation (right part). Our algorithm, using the geometric implementation of the 
affine morphological scale space gives the result shown in Fig. 7, right hand 
image. 

7 Conclusion 

In this paper, we described how the Affine Morphological Scale Space of an 
image can be implemented in a geometric manner. Compared to classical scalar 
schemes, the main advantages are a much higher accuracy both in terms of 
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image definition and in terms of fidelity to the scale space properties (contrast- 
invariance and affine-invariance). The algorithm needs a large amount of memory 
but is still rather fast, and the representation it induces also allows very fast 
geometric image deformations and contrast changes. 

Our method relies on a level set decomposition/reconstruction and on a par- 
ticular geometric algorithm for affine curve shortening, but it could be gen- 
eralized to other curve evolutions, as similar geometric algorithms for general 
curvature driven curve evolutions begin to appear (see [4]). Another generaliza- 
tion could be made by using some image interpolation for the extraction of the 
level sets : however, in this case the representation will generally no be contrast- 
invariant any more. A more geometric extension of the algorithm relying on the 
interpolation of new level lines using the Absolute Minimizing Lipschitz Exten- 
sion (see [5]) could also be investigated for visualization tasks. 
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Abstract. The classical morphological segmentation paradigm is based 
on the watershed transform, constructed by flooding the gradient im- 
age seen as a topographic surface. For flooding a topographic surface, 
a topographic distance is defined from which a minimum distance algo- 
rithm is derived for the watershed. In a continuous formulation, this is 
modeled via the eikonal PDE, which can be solved using curve evolution 
algorithms. Various ultrametric distances between the catchment basins 
may then be associated to the flooding itself. To each ultrametric dis- 
tance is associated a multiscale segmentation; each scale being the closed 
balls of the ultrametric distance. 



1 Introduction 

Segmentation is one of the most challenging tasks in image processing, as it 
requires to some extent a semantic understanding of the image. The morpholog- 
ical segmentation paradigm, based on the watershed transform and markers, has 
been extremely successful, both for interactive as for automatic segmentation. 
Its principle is simple: a) a gradient image of the scene is constructed; b) for each 
object of interest, an inside particle is detected, either in an automatic manner 
or in an interactive manner; c) construction of the watershed associated to the 
markers. Its avantage is the robustness: the result is independent of the shape 
or the placement of the markers in the zones of interest. The result is obtained 
by a global minimization implying both the topography of the surface and the 
complete set of markers. 

This paradigm has met its limits with the emergence of new segmentation 
tasks in the area of communications and multimedia industry. The development 
of games, teleworking, teleshopping, television on demand, videoconferences etc. 
has multiplied situations where images and sequences have not only to be trans- 
mitted but also manipulated, selected, assembled in new ways. This evolution 
is most challenging for segmentation techniques: one has to segment complex 
sequences of color images in real time, be automatic but also able to deal with 
user interaction. 

Object oriented coding represents an even greater challenge for segmentation 
techniques. Such encoders segment the scene into homogeneous zones for which 
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contours, motion and texture have to be transmitted. Depending upon the tar- 
geted bitstream and the complexity of the scene, a variable number of regions 
has to be transmitted. Hence an automatic segmentation with a variable number 
of regions is required for sequences for which the content or even content type is 
not known a priori. Hence, there is no possibility to devise a strategy for finding 
markers, and as a consequence the traditional morphological segmentation based 
on watershed and markers fails. 

This situation has triggered the development of new techniques of multiscale 
segmentation, where no markers are required. In such cases it is of interest to 
construct a sequence of nested partitions going from coarse to fine; each boundary 
of a coarse segmentation also being a boundary of all finer segmentations. We 
will call such a series of nested partitions a multiscale cube (we do not call it 
pyramid, as the resolution of the images is not reduced when going from fine to 
coarse). Such a multiscale cube may be used in various ways: 

— chose in the cube a slice with the appropriate number of regions 

— compose a segmentation by extracting regions from different slices of the 
cube. This may be done in an interactive way. It may also result by mini- 
mizing some global criterion (for instance, if a texture model is adopted for 
each region, it is possible to measure the distance between the model and 
the original image in each region. It is then possible to minimize a weighted 
sum of the length of the contours and of the global distortion of the image). 

— use the pyramid for defining new dissimilarity measures between the adja- 
cent catchment basins, which may be used for segmenting with markers and 
yield better results as the traditional segmentation with markers, using the 
altitude of the gradient. 

In absence of any knowledge of the image content, it is important to find 
good psychovisual criteria for constructing the cube. 

In this paper, we first discuss the monoscale watershed segmentation by flood- 
ing both from a discrete formulation of the shortest topographic distance as well 
as from a continuous viewpoint of the eikonal PDE and curve evolution. Fur- 
ther, for multiscale segmentation, we use ultrametric distances to generalize the 
flooding and improve the segmentation. 



2 The classical morphological segmentation paradigm 

2.1 Flooding a topographic surface 

The classical morphological tool for segmentation is the watershed transform. 
For segmenting an image /, first its edges are enhanced by computing its gradient 
magnitude ||V/||. This is approximated by the discrete morphological gradient 
S{f) — s{f), where S{f) = / © H is the flat dilation of / by a small disk B and 
s(/) = f Q B is the flat erosion of / by B. After the edge enhancement, the 
segmentation process starts with creating flooding waves that emanate from a 
set of markers (feature points inside desired regions) and flood the topographic 
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surface ||V/||. The points where these flooding waves meet each other form the 
segmentation boundaries. The simplest markers are the regional minima of the 
gradient image. Very often, the minima are extremely numerous, leading to an 
oversegmentation. For this reason, in many practical cases, the watershed will 
take as sources of the flooding a smaller set of markers, which have been identified 
by a preliminary analysis step as inside germs of the desired segmentation. 

2.2 Modifying a topographic surface: Swamping 

In the case where the sources for the flooding are not all minima of the topo- 
graphic surface, two solutions are possible. Either use the markers as sources. In 
this case, catchement basins without sources are flooded from already flooded 
neighbouring region. Such a flooding algorithm, using hierarchical queues has 
been described in [1]. 

The second solution consists in modifying the topographic surface as slightly 
as possible, in such a way that the markers become its only regional minima. 
This operation is called swamping. If mi,m 2 , ■■■mk are the binary markers we 
construct a marker function g defined as follows : g = White outside the markers 
and g = Black inside the markers. On the other hand, the topographic surface 
/ is modified by assigning the value Black to all regional minima. We then per- 
form a closing by reconstruction of / from the marker function g. This can be 
accomplished by an iterative algorithm which at each iteration forms a condi- 
tional erosion, i.e., a supremum (V) of the erosion of the previous iterate and 
the original function: 



9o = gyf 

gfc = £r(gfe_i) V / , A: =1,2,3,... 

In the limit as A: — >■ oo we obtain the function goo which is the result of the closing 
by reconstruction. This new function is as similar as possible to the function /, 
except that its only regional minima are the family {rrii}. Hence, its catchment 
basins will give the desired segmentation. 

3 Watershed Segmentation: Discrete and Continuous 

3.1 Discrete Watershed and Topographic Distance 

We consider first images in a digital framework. Images are represented on reg- 
ular graphs where the nodes represent the pixels and the edges the neighbor- 
hood relations. A connected component of uniform grey tone is called plateau. 
A plateau without lower (resp. higher) neighbors is a regional minimum (resp. 
maximum) . 

Let us now consider a drop of water falling on a topographic surface / for 
which the regional minima are the only plateaus. If it falls outside a plateau, 
it will glide along a path of steepest descent. If the altitude of a pixel x is 
f{x), the altitude of its lowest neighbor defines the erosion e{f){x) of size 1 




354 F. Meyer and P. Maragos 




nrrwn 



Fig. 1. (a) Initial topographic surface, (b) Creation of the marker, (c) Result of the 
swamping. 



at pixel X. Hence the altitude of the steepest descending slope at pixel x is 
slope(x) = f{x) — e{f){x). If x and y are two neighboring pixels, we will define 
the topographic variation topvar(a::, y) between x and y as slope(a;) if f{x) > f{y) 
and as Blope(.)+slope(,) ^ 

If 7T is a path {x = pi,P 2 , ■■■,y = Pn) between two pixels x and j/, we define 
the topographical variation along the path tt as the sum ^ topvar(pi,pi_|_i) 

of the elementary topographical variations along the path tt. The topographical 
distance between two pixels x and y is defined as the minimal topographical 
variation along all paths between x and y. By construction, the trajectory of a 
drop of water falling on the surface is a geodesic line of the topographic distance. 
A pixel p belongs to the upstream of a pixel q if and only if the topographic 
distance between both pixels is equal to | f{p) — f{q) \. Let us now transform 
the topographic surface by putting all regional minima at altitude 0. 

Definition 1. We call catchment basin CB(TOi) of a regional minimum mi the 
set of pixels which are closer to mt than to any other regional minimum for the 
topographical distance 

A more general description of the topographic distance, also valid for images 
with plateaus may be found in [7]. Within each catchment basin, the set of pixels 
closer to the minimum than a given topographic distance h are all pixels of this 
basin with an altitude below h. In this framework the construction of the catch- 
ment basins becomes a shortest path problem, i.e., finding the path between a 
marker and an image point that corresponds to the minimum weighted distance. 
Computing this minimum weighted distance at all image points from any marker 
is also equivalent to finding the gray-weighted distance transform (GWDT) of 
the image. There are several types of discrete algorithms to compute the GWDT 
which include iterated (sequential or parallel) min-sum differences [13] and hier- 
archical queues [7]. Instead of elaborating more on discrete GWDT algorithms, 
we prefer now to proceed to our next formulation of watershed that will be 
based on a continuous (PDE-based) model. Afterwards, the discrete GWDT 
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will be re-interpreted as one possible discrete approximation to the solution of 
the continuous problem. 



3.2 Continuous Watershed and Eikonal PDE 

The watershed transforms an image f{x, y) to the crest lines separating adjacent 
catchment basins that surround regional minima or other ‘marker’ sets of feature 
points. In a continuous formulation, the topographic distance of / along a path 
becomes the line integral of ||V/|| along this path. Viewing the domain of / as 
a 2D optical medium with a refractive index field rj{x,y) = ||V/||, makes the 
continuous topographic distance function equivalent to the optical path length 
which is proportional to the time required for light to travel this path. This leads 
to the eikonal PDE 

\\VU{x,y)\\=r]{x,y), 77 (a;, 2 /) = ||V/(a;, 2 /)|| (2) 

whose solution for any field rj{x,y) is a weighted distance function [11,2]. In 
the continuous domain and assuming that the image is smooth and has isolated 
critical points, the continuous watershed is equivalent to finding a skeleton by 
influence zones with respect to a weighted distance function that uses points in 
the regional minima of the image as sources and 77 = ||V/|| as the field of indices 
[9,7]. If other markers different than the minima are to be used as sources, then 
the homotopy of the function must be modified via morphological reconstruction 
to impose these markers as the only minima. 

Modeling the watershed via the eikonal has the advantage of a more isotropic 
flooding but also poses some challenges for its implementation. This problem can 
be approached by viewing the solution of the eikonal PDE as a gray-weighted dis- 
tance transform ( GWDT) whose values at each pixel give the minimum distance 
from the light sources weighted by the gray values of the refractive index field. 
Next we outline two ways of solving the eikonal PDE as applied to segmentation. 



3.3 GWDT based on Chamfer Metrics 

Let r][i,j] be a sampled nonnegative gray-level image and let us view it as a 
discrete refractive index field. Also let S' be a set of reference points or the 
‘sources’ of some wave or the location of the wavefront at time t = 0. As discussed 
earlier, the GWDT finds at each pixel p = [i,j] the smallest sum of values of 77 
over all possible paths connecting p to the sources S. 

This discrete GWDT can be computed by running a 2D min-sum difference 
equation like the one implementing the chamfer distance transform of binary 
images but with spatially-varying coefficients proportional to the gray image 
values [13]: 

Uk[i,j] = min{C/fe [7 - 1 , j] + a'q[i,j],Uk[i,j - 1] + a'q[i,j], . . 

Uk[i-^,j -^] + h'q[i,j],Uk[i-l,j + l] + br][i,j],Uk-i[i,j]} 
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where Uq is the O/oo indicator function of the source set S. Starting from Uq, a 
sequence of functions Uk is iteratively computed by running (3) over the image 
domain in a forward scan for even k, whereas for odd k an equation as in (3) 
but with a reflected coefficient mask is run in a backward scan. In the limit 
/c — >■ oo the final GWDT Uao is obtained. In practice, this limit is reached after 
a finite number of passes. The above implementation can also be viewed as a 
procedure of finding paths of minimal ‘cost’ among nodes of a weighted graph 
or as discrete dynamic programming. As such it is actually known as Dijkstra’s 
algorithm. There are also other faster implementations using queues [13,6]. The 
above GWDT based on discrete chamfer metrics is shown in [13] and [4] to be a 
discrete approximate solution of the eikonal PDE ||V[/|| = rj. 

The constants a and b are the distance steps by which the planar chamfer 
distances are propagated within a 3 x 3 neighborhood. To improve the GWDT 
approximation to the eikonal’s solution, one can optimize (a, b) to minimize the 
error between the chamfer and Euclidean distances and/or use larger neighbor- 
hoods (at the cost of a slower implementation). However, using a neighborhood 
larger than 5x5 may give erroneous results since the large masks can bridge over 
a thin line that separates two segmentation regions. Overall, this chamfer metric 
approach to GWDT is fast and easy to implement, but due to the required small 
neighborhoods is not isotropic and cannot achieve high accuracy. 



3.4 GWDT based on Curve Evolution 

In the standard digital watershed algorithm [8,14], the flooding at each level 
is achieved by a planar distance propagation that uses the chess-board metric. 
This kind of distance propagation is non-isotropic and could give wrong results, 
particularly for images with large plateaus, as we found experimentally. Eikonal 
segmentation using GWDTs based on chamfer metrics improves this situation 
a little but not entirely. In contrast, for images with large plateaus/regions, 
segmentation via the eikonal PDE and curve evolution GWDT gives results 
close to ideal. 

In the PDE-based watershed approach [5] , at time t = 0 the boundary of each 
source is modeled as a curve 7(0) which is then propagated with normal speed 
c{x,y) = co/r]{x,y) = co/\\S/ f{x,y)\\, where cq is the largest constant speed 
(e.g., the speed of light in vacuum). The propagating curve 7(f) is embedded as 
the zero-level curve of a function F{x,y,t), where F{x,y,0) = Fo{x,y) is the 
signed (positive in the curve interior) distance from 7(0). The function F evolves 
according to the PDE 

^ = c{x,y)\\VF\\ (4) 

As analyzed in [10,12], this PDE implies that all the level curves of F propagate 
with a position-dependent normal speed c{x,y) > 0. This is a time-dependent 
formulation of the eikonal PDE and can be solved via the entropy condition 
satisfying numerical algorithm of [10]. The value of the resulting GWDT at any 
pixel (cc, y) of the image is the time it takes for the evolving curve to reach this 
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pixel, i.e. the smallest t such that F{x,y,t) > 0. The wateshed is then found 
along the lines where wavefronts emanating from different markers collide and 
extinguish themselves. 

To reduce the computational complexity of solving general eikonal PDE prob- 
lems via curve evolution a ‘fast marching’ algorithm was developed in [12,3] that 
tracks only a narrow band of pixels at the boundary of the propagating wave- 
front. For the eikonal PDE segmentation problem, a queue-based algorithm has 
been developed in [5] that combines features from the fast marching method to 
computing GWDTs and can deal with the case of multiple sources where triple 
points develop at the collision of several wavefronts. 

As Fig. 2 shows, compared on a test image that is difficult (because ex- 
panding wavefronts meet watershed lines at many angles ranging from being 
perpendicular to almost parallel), the continuous segmentation approach based 
on the eikonal PDE and curve evolution outperforms the discrete segmentation 
results (using either the digital watershed flooding algorithm or chamfer metric 
GWDTs). However, some real images may not contain many plateaus or only 
large regions, in which cases the digital watershed flooding algorithm may give 
comparable results than the eikonal PDE approach. 

4 Ultrametric distances associated to flooding 

4.1 Ultrametric distance and multiscale partitions 

The first part of the paper has described the tools for producing the finest 
partition, from which a multiscale representation may be derived. Let Pq = 
{Poi,Po 2 , ■■■Pon) be the list of regions forming the finest partition. We are inter- 
ested in constructing a series of nested partitions Pk = {Pki,Pk 2 , ■■■Pkn), where 
each region Pkj is the union of a number of regions of finer partitions Pi, for 
I < k. 

It is classical to associate to the series of nested partitions {Pk) an ultrametric 
distance : 

d{Poi,Poj) = min(/ | 3Pih G Pi for which Pqi C Pm and Pqj C Pm)- In other 
words, the ultrametric distance is the smallest index of a partition Ph, of which 
one of the sets Pm contains both regions Pq* and P^j. 

It is an ultrametric distance as it verifies the following axioms : 

* reflexivity : d(Poo^oi) = 0 

* symmetry: d{Poi,Poj) = d{PQj,Pm) 

* ultrametric inequality : for all i,j,k: 
d{Poit Poj) — niax'[d(Poz 5 .^ofc) 5 d(PQ/j; , Pqj ) } 

The first two axioms are obviously verified. The last one may be interpreted 
as follows : the smallest index I of a region Pm containing both regions Pqi and 
Poj is necessarily smaller or equal than the smallest index rt of a region 
containing all three regions Poi, Poj and Pofc 

An ultrametric distance is a distance, as the ultrametric inequality is stronger 
than the triangular inequality. A closed ball for the ultrametric distance with 
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(a) 



(b) 




(c) (d) 

Fig. 2. Performance of various segmentation algorithms on a Test image (250 x 400 
pixels). This image is the minimum of two potential functions. Its contour plot (thin 
bright curves) is superimposed on all segmentation results. Markers are the two source 
points of the potential functions. Segmentation results based on: (a) Digital watershed 
flooding algorithm, (b) GDWT based on optimal 3x3 chamfer metric, (c) GDWT 
based on optimal 5x5 chamfer metric, (d) GDWT based on curve evolution. (The 
thick bright curve shows the correct segmentation.) 



centre Pofe and radius n is the set of all regions Poi for which d{Poi,Poj) < n. 
The balls associated to an ultrametric distance have two unique features, which 
will be useful in segmentation. The radius of a ball is equal to its diameter, i.e. 
to the largest distance between two elements in the ball. Each element of a ball 
is the centre of this ball. It is easy to check that the union of all closed balls of 
radius n precisely constitute the partition P„. 

Inversely we will associate a series of nested partitions to each ultrametric 
distance, by taking for partition of rank n, the set of closed balls of radius n. We 
will now define several ultrametric distances, naturally associated to the flooding 
of a topographic surface. Each of them will yield a different partition cube. 
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4.2 Flooding Tree 

A finer analysis of the flooding will show the apparition of a tree in which 
the nodes are the catchment basins and the edges represent relations between 
neighboring nodes. Let us observe the creation and successive fusions of lakes 
during the flooding. The level of the flood is uniform over the topographic surface 
and increases with constant speed : new lakes appear as the flood reaches the 
various regional minima. At the time of apparition, each lake is isolated. As the 
level increases and reaches the lowest passpoint separating the corresponding 
CB from a neighboring CB, two lakes will merge. Two types of passpoints are 
to be distinguished. When the level of the flood reaches the first type, two lakes 
previously completely disconnected merge ; we will call these passpoints first 
meeting passes. When the flood reaches the second type, two branches of a 
unique lake meet and form a closed loop around an island. Representing each 
first meeting pass as an edge of a graph and the adjacent catchment basins as 
the nodes linked by this edge will create a graph. It is easy to see that this graph 
is a tree, spanning all nodes. It is in fact the minimum spanning tree (MST) of 
the neighborhood graph obtained by linking all neighboring catchment basins 
by an edge weighted by the altitude of the passpoint between them. 

4.3 Flooding via Ultrametric Distances 

Each edge of the spanning tree represents a passpoint where two disconnected 
lakes meet. We will assign to this edge a weight derived by measuring some geo- 
metric features on each of the adjacent lakes. We consider four different measures. 
The simplest is the altitude of the passpoint itself. The others are measured on 
each lake separately : they are respectively the depth, the area and the volume 
of the lakes For each of these four types a weight is derived as follows. Let 
us consider for instance the volume : the volumes of both lakes are compared 
and the smallest value is chosen as volumic weight of the edge. Depth and area 
measures are treated similarly leading respectively to weight distributions called 
dynamics for the depth and surfacic weight distributions. If the height is chosen, 
we get the usual weight distribution of the watershed. 

We will now define an ultrametric distance associated to each weight dis- 
tribution on the MST : the distance d{x,y) is defined as the highest weight 
encountered on the unique path going from x to y along the spanning tree. This 
relation obviously is reflexive and symmetrical. The ultrametric inequality also 
is verified : for all x, y, z, d{x, y) < max{(i(a;, z),d{z, y)}; since the highest weight 
on the unique path going from x to y along the spanning tree is smaller or equal 
to the highest way on the unique path which goes first from x to z and then 
from z to y along the spanning tree. 

The closed balls of the ultrametric distance precisely correspond to the seg- 
mentation tree induced by the minimum spanning tree. The balls of radius 0 
are the individual nodes, corresponding to the catchment basins. Each ball of 
radius n is the union of all nodes belonging to one of the subtrees of the MST 
obtained by cutting all edges with a valuation higher than n. A closed ball of 




360 F. Meyer and P. Maragos 



radius R and centre C is the set of nodes which belong to the same subtree of 
the MST, obtained by cutting the edges at altitude higher than or equal to R 
and containing C . Obviously replacing the centre C by any other node of the 
subtree yields the same subtree. 

Cutting the (k — 1) highest edges of the minimum spanning tree creates a 
forest of k trees. This is the forest of k trees of minimal weight contained in 
the neighborhood graph. Depending on the criterion on which the ultrametric 
distance is based, the nested segmentations will be more or less useful. The ultra- 
metric distance based on altitude is the less useful. The segmentation based on 
depth are useful for ranking the particles according to their contrast. The area 
ultrametric distance will focus on the size of the particles. The volumic ultra- 
metric distance has particularly good psychovisual properties [15]: the resulting 
segmentation trees offer a good balance between size and contrast, as illustrated 
in the following figures. The topographical surface to be flooded is a color gradi- 
ent of the initial image (maximum of the morphological gradients computed in 
each of the R, G and B color channels). The volumic ultrametric distance has 
been used, and 3 levels of fusions have been represented, corresponding respec- 
tively to 15, 35 and 60 regions. 




Initial 15 regions 35 regions 60 regions 



Fig. 3. Multiscale segmentation example. 



5 Applications 

5.1 Interactive segmentation with nested segmentations 

A toolbox for interactive editing is currently constructed at the CMM [16], based 
on nested segmentations. A mouse position is defined by its x-y coordinates and 
its depth in the segmentation tree. If the mouse is active, the whole tile containing 
the cursor is activated. Moving the mouse in the x-y plane permits to select or 
deselect regions at the current level of segmentation. Going up will produce a 
coarser region, going down a smaller region. This technique permits to ’’paint” 
the segmentation with a kind of brush, whose shape adapts itself to the contours 
and whose size may be interactively changed by the user. 
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6 The watershed from markers 

In many situations one has a seed for the objects to segment. It may be the 
segmentation produced in the preceding frame when one has to track an object 
in a sequence. It may also be some markers produced by hand, in interactive 
segmentation scenarios. As a result, some nodes of the minimum spanning tree 
may be identified as markers. The resulting segmentation associated to these 
markers will then still be a minimum spanning forest, but constrained in that 
each tree is rooted in a marker. The algorithm for constructing the minimum 
spanning forest is closely related to the classical algorithms for constructing the 
MST itself (see ref[17]. for more details). Each marker gets a different label and 
constitutes the initial part of a tree. The edges are ranked and processed in 
increasing order. The smallest unprocessed edge linking one of the tree T to an 
outside node is considered ; if this node does not already belong to another tree, 
it is assigned to the tree T. If it belongs to another tree, the edge is discarded 
and the next edge is processed. 

Segmenting with markers constitutes the classical morphological method for 
segmentation. For optimal results, it is important to correctly chose the underly- 
ing ultrametric distance. We have presented 3 new distances giving often better 
results than the classically used flooding distance (where the weights are the 
altitude of the passpoints) This is illustrated by the following figures, where 
the same set of markers has been used alternatively with the flooding distance 
and with the volumic distance. The superiority of the volumic distance clearly 
appears here : it correctly detects the face, whereas the flooding distance follows 
the boundary of a shadow and cuts the face in two. 




markers flooding dist. volumic dist. 

Fig. 4. Segmentations with different ultrametric floodings. 



7 Conclusion 

A multiscale segmentation scheme has been presented, embedded in the flooding 
mechanism of the watershed itself. It opens many new possibilities for segmen- 
tation, either in supervised or unsupervised mode. 




362 F. Meyer and P. Maragos 



Acknowledgements: P. Maragos’ work in this paper was partially sup- 

ported by the European TMR/Networks Project ERBFMRXCT970160. 

References 

1. S. Bencher and F. Meyer. The morphological approach to segmentation: the water- 
shed transformation. In E. Dougherty, editor, Mathematical morphology in image 
processing, chapter 12, pages 433-481. Marcel Dekker, 1993. 

2. R. Kimmel, N. Kiryati, and A. M. Bruckstein, “Sub-Pixel Distance Maps and 
Weighted Distance Transforms”, J. Math. Imaging & Vision, 6:223-233, 1996. 

3. R. Malladi, J. A. Sethian, and B. C. Vemuri, “A Fast Level Set Based Algorithm for 
Topology-Independent Shape Modeling”, J. Math. Imaging and Vision, 6, pp. 269- 
289, 1996. 

4. P. Maragos, “Differential Morphology and Image Processing” IEEE Trans. Image 
Processing, vol. 78, pp. 922-937, June 1996. 

5. P. Maragos and M. A. Butt, “Advances in Differential Morphology: Image Segmen- 
tation via Eikonal PDE & Curve Evolution and Reconstruction via Constrained 
Dilation Flow”, in Mathematical Morphology and Its Applications to Image and 
Signal Processing, H. Heijmans and J. Roerdink, Eds., Kluwer Acad. Pubh, 1998, 
pp. 167-174. 

6. F. Meyer, “Integrals and Gradients of Images”, Proc. SPIE vol. 1769: Image Al- 
gebra and Morphological Image Processing III, pp. 200-211, 1992. 

7. F. Meyer, “Topographic Distance and Watershed Lines”, Signal Processing, 38, 
pp. 113-125, 1994. 

8. F. Meyer and S. Beucher, “Morphological Segmentation”, J. Visual Commun. Im- 
age Representation, l(l):21-45, 1990. 

9. L. Najman and M. Schmitt, “Watershed of a Continuous Function”, Signal Pro- 
cessing, vol. 38, pp. 99-112, July 1994. 

10. S. Osher and J. Sethian, “Fronts Propagating with Curvature-Dependent Speed: 
Algorithms Based on Hamilton-Jacobi Formulations”, J. Comput. Physics, 79, 
pp. 12-49, 1988. 

11. E. Rouy and A. Tourin, “A Viscocity Solutions Approach to Shape from Shading”, 
SIAM J. Numer. Anal., vol. 29 (3), pp. 867-884, June 1992. 

12. J. A. Sethian, Level Set Methods, Cambridge Univ. Press, 1996. 

13. P. Verbeek and B. Verwer, “Shading from shape, the eikonal equation solved by 
grey-weighted distance transform”. Pattern Recogn. Lett., 11:618-690, 1990. 

14. L. Vincent and P. Soille, “Watershed In Digital Spaces: An Efficient Algorithm 
Based On Immersion Simulations”, IEEE Trans. Pattern Anal. Mach. Intellig., 
vol. 13, pp. 583-598, June 1991. 

15. C. Vachier. Extraction de Caracteristiques, Segmentation d’ Image et Morphologic 
Mathematique. PhD thesis, E.N.S. des Mines de Paris, 1995. 

16. F. Zanoguera, B. Marcotegui and F. Meyer, “An interactive colour image segmenta- 
tion system” , Wiamis ’99: Workshop on Image Analysis for Multimedia Interactive 
Services, pp. 137-141. Heinrich-Hertz Institut Berlin, 1999. 

17. F. Meyer. Minimal spanning forests for morphological segmentation. ISMM94 : 
Mathematical Morphology and its applications to Signal Processing, pages 77-84, 
1994. 




Nonlinear PDEs and Numerical Algorithms for 
Modeling Levelings and Reconstruction Filters 



Petros Maragos^ and Fernand Meyer^ 



^ National Technical University of Athens, Dept, of Electrical & Computer 
Engineering, Zografou 15773, Athens, Greece. Email: maragos@cs.ntua.gr 
^ Centre de Morphologie Mathematique, Ecole des Mines de Paris, 

35, Rue Saint Honore, 77305 Fontainebleau, France. Email: meyer@cmm.ensmp.fr 



Abstract. In this paper we develop partial differential equations (PDEs) 
that model the generation of a large class of morphological filters, the 
levelings and the openings/closings by reconstruction. These types of 
filters are very useful in numerous image analysis and vision tasks rang- 
ing from enhancement, to geometric feature detection, to segmentation. 
The developed PDEs are nonlinear functions of the first spatial deriva- 
tives and model these nonlinear filters as the limit of a controlled growth 
starting from an initial seed signal. This growth is of the multiscale di- 
lation or erosion type and the controlling mechanism is a switch that 
reverses the growth when the difference between the current evolution 
and a reference signal switches signs. We discuss theoretical aspects of 
these PDEs, propose discrete algorithms for their numerical solution and 
corresponding filter implementation, and provide insights via several ex- 
periments. Finally, we outline the use of these PDEs for improving the 
Gaussian scale-space by using the latter as initial seed to generate mul- 
tiscale levelings that have a superior preservation of image edges and 
boundaries. 



1 Introduction 

For several tasks in computer vision, especially the ones related to scale-space 
image analysis, there have been proposed continuous models based on partial dif- 
ferential equations (PDEs) . Motivations for using PDEs include better and more 
intuitive mathematical modeling, connections with physics, and better approxi- 
mation to the Euclidean geometry of the problem. While many such continuous 
approaches have been linear (the most notable example being the isotropic heat 
diffusion PDE for modeling the Gaussian scale-space), many among the most 
useful ones are nonlinear. This is partly due to a general understanding about 
the limitations or inability of linear systems to successfully model several impor- 
tant vision problems. 

Areas where there is a need to develop nonlinear approaches include the 
class of problems related to scale-space analysis and multiscale image smooth- 
ing. In contrast to the shifting and blurring of image edges caused by linear 
smoothers, there is a large variety of nonlinear smoothers that either suffer less 
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from or completely avoid these shortcomings. Simple examples are the classic 
morphological openings and closings (cascades of erosions and dilations) as well 
as the median filters. The openings suppress signals peaks, the closings elimi- 
nate valleys, whereas the medians have a more symmetric behavior. All three 
filter types preserve well vertical image edges but may shift and blur horizontal 
edges/boundaries. A much more powerful class of filters are the reconstruction 
openings and closings which, starting from a reference signal / consisting of 
several parts and a marker (initial seed) g inside some of these parts, can re- 
construct whole objects with exact preservation of their boundaries and edges. 
In this reconstruction process they simplify the original image by completely 
eliminating smaller objects inside which the marker cannot fit. The reconstruc- 
tion filters enlarge the flat zones of the image [15]. One of their disadvantages is 
that they treat asymmetrically the image foreground and background. A recent 
solution to this asymmetry problem came from the development of a more gen- 
eral powerful class of morphological filters, the levelings [10,11], which include 
as special cases the reconstruction openings and closings. They are transfor- 
mations ^{f,g) that depend on two signals, the reference / and the marker g. 
Reconstruction filters and levelings have found numerous applications in a large 
variety of problems involving image enhancement and simplification, geometric 
feature detection, and segmentation. They also possess many useful algebraic 
and scale-space properties, discussed in a companion paper [12]. 

In this paper we develop PDEs that can model and generate levelings. These 
PDEs work by growing a marker (initial seed) signal ^ in a way that the growth 
extent is controlled by a reference signal / and its type (expansion or shrink- 
ing growth) is switched by the sign of the difference between / and the current 
evolution. This growth is modeled by PDEs that can generate multiscale dila- 
tions or erosions. Therefore, we start first with a background section on dilation 
PDEs. Afterwards, we introduce a PDE for levelings of ID signals and a PDE 
for levelings of 2D images, propose discrete numerical algorithms for their imple- 
mentation, and provide insights via experiments. We also discuss how to obtain 
reconstruction openings and closings from the general leveling PDE. Further, we 
develop alternative PDEs for modeling generalized levelings that create quasi- 
flat zones. Finally, we outline the use of these PDEs for improving the Gaussian 
scale-space by using the latter as initial seed to generate multiscale levelings that 
have a superior preservation of image edges and boundaries. 

2 Dilation/Erosion PDEs 

All multiscale morphological operations, at their most basic level, are generated 
by multiscale dilations and erosions, which are obtained by replacing in the 
standard dilations/erosions the unit-scale kernel (structuring element) K{x,y) 
with a multiscale version y) = tK{x/t, y /t), t > 0. The multiscale dilation 

of a 2D signal f{x,y) by is the space-scale function 

5{x, y, t) = {f®K^*^){x, y) = sup{/(a; -a,y-h) + tK{a/t, b/t)} , t>0 

(a,b) 
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where S{x,y,0) = f{x,y). Similarly, the multiscale erosion of / is defined as 
e{x, y, t) = {fQK^*'>){x, y) = inf {/(x + a,y + b)~ tK{a/t, b/t)} 

(a,b) 

Until recently the vast majority of implementations of multiscale morphological 
filtering had been discrete. In 1992, three teams of researchers independently 
published nonlinear PDEs that model the continuous multiscale morphological 
scale-space. In [1] PDEs were obtained for multiscale flat dilation and erosion 
by compact convex sets as part of a general work on developing PDE-based 
models for multiscale image processing that satisfy certain axiomatic principles. 
In [4] PDEs were developed that model multiscale dilation, erosion, opening 
and closing by compact-support convex sets or concave functions which may 
have non-smooth boundaries or graphs, respectively. This work was based on 
the semigroup structure of the multiscale dilation and erosion operators and the 
use of sup/inf derivatives to deal with the development of shocks. In [18] PDEs 
were obtained by studying the propagation of boundaries of 2D sets or signal 
graphs under multiscale dilation and erosion, provided that these boundaries 
contain no linear segments, are smooth and possess a unique normal at each 
point. Refinements of the above three works for PDEs modeling multiscale mor- 
phology followed in [2,5,6,8,9,19]. The basic dilation PDE was applied in [3,16] 
for modeling continuous-scale morphology, where its superior performance over 
discrete morphology was noted in terms of isotropy and subpixel accuracy. Next 
we provide a few examples.^ 

For ID signals /(a;), and if K{x) is the 0/ — oo indicator function of the 
interval [—1, 1], then the PDEs generating the multiscale flat dilation 5{x, t) and 
erosion e{x, t) of / are: 

, £i = -\£x\ (1) 

with initial values 5(a;,0) = £r(a;,0) = f{x). 

For 2D signals f{x, y), and if K{x, y) is the 0/ — oo indicator function of the 
unit disk, then the PDEs generating the multiscale flat dilation 6{x,y,t) and 
erosion s(x, y, t) of / are: 

^t = l|V<5|| = y(<5.)2 + (5,)2; , £t = -||Ve|| (2) 

with initial values S(x, y, 0) = e{x, y, 0) = f{x, y). 

These simple but nonlinear PDEs are satisfied at points where the data 
are smooth, i.e., the partial derivatives exist. However, even if the initial im- 
age/signal / is smooth, at finite scales t > 0 the above multiscale dilation evolu- 
tion may create discontinuities in the derivatives of 6, called shocks, which then 
continue propagating in scale-space. Thus, the multiscale dilations are weak so- 
lutions of the corresponding PDEs. 

The above PDEs for dilations of graylevel images by flat structuring elements 
directly apply to binary images, because flat dilations commute with threshold- 
ing and hence, when the graylevel image is dilated, each one of its thresholded 

^ Notation: For u = u{x,y,t), ut = dujdt, Ux = dujdx, Uy = du/dy, Vm = (ux,Uy). 




366 P. Maragos and F. Meyer 



versions representing a binary image is simultaneously dilated by the same ele- 
ment and at the same scale. However, this is not the case with graylevel structur- 
ing functions. For example, if K{x, y) = —a{x‘^ + y'^), a > 0, is an infinite-support 
parabolic function, the dilation PDF becomes 

S,= \\VS\\y4a=[{S,r + {5yr]/4a (3) 

3 PDE for ID Leveling 

Consider a ID signal f{x) and a marker signal g{x) from which a leveling '?'(/, 5 ) 
will be produced. 

If g < / everywhere and we start iteratively growing g via incremental flat 
dilations with an infinitesimally small element [—At, At] but without ever grow- 
ing the result above the graph of /, then in the limit we shall have produced the 
opening by reconstruction of / (with respect to the marker g), which is a special 
leveling. The infinitesimal generator of this signal evolution can be modeled via 
a dilation PDE that has a mechanism to stop the growth whenever the interme- 
diate result attempts to create a function larger than /. Specifically, let u{x, t) 
represent the evolutions of / with initial value uq{x) = u(x,0) = g(x). Then, u 
is a weak solution of the following initial-value PDE system 

Ut = sign(/ — u)\ux\ = I ^ ^ (4) 

^ ' y 0, u= for Ux = 0 ^ ^ 

u{x,0) = g{x) < f{x) (5) 



where sign(r) is equal to -1-1 if r > 0, — 1 if r < 0 and 0 if r = 0. This PDE models 
a conditional dilation that grows the intermediate result as long as it does not 
exceed /. In the limit we obtain the final result Uao(x) = u(x,t). The 

mapping uq i— >■ Woo is the opening by reconstruction filter. 

If in the above paradigm we reverse the order between / and g, i.e., assume 
that g{x) > f{x) \/x, and replace the positive growth (dilation) of g with negative 
growth via erosion that stops when the intermediate result attempts to become 
smaller than /, then we obtain the closing by reconstruction of / with respect 
to the marker g. This is another special case of a leveling, whose generation can 
be modeled by the following PDE: 

Ut = — sign(u — f)\ux\ = I \ux\,u > f ^ ^ 

u{x,0) = g{x) > f{x) (7) 



What happens if we use any of the above two PDEs when there is no specific 
order between / and g7 The signal evolutions are stored in a function u{x, f) 
that is a weak solution of the initial-value PDE system 

ut{x,t) = \uxix,t)\siga[f{x) - u{x,t)] 
u(x,0) = g{x) 



(8) 




Modeling Levelings and Reconstruction Filters 367 

This PDE has a varying coefficient sign(/ — u) with spatio-temporal dependence 
which controls the instantaneous growth and stops it whenever f = u. (Of course, 
there is no growth also at extrema where = 0.) The control mechanism is of a 
switching type: For each t, at points x where u{x, t) < f{x) it acts as a dilation 
PDE and hence shifts parts of the graph of u{x, t) with positive (negative) slope 
to the left (right) but does not move the extrema points. Wherever u{x, t) > f{x) 
the PDE acts as an erosion PDE and reverses the direction of propagation. The 
final result Uao{x) = limj_>oo u{x,t) is a general leveling of / with respect to g. 
We call (8) a switched dilation PDE. The switching action of this PDE model 
occurs at zero crossings of f—u where shocks are developed. Obviously, the PDFs 
generating the opening and closing by reconstruction are special cases where 
g < f and g > f, respectively. However, the PDFs generating the reconstruction 
filters do not involve switching of growth. 

The switching between a dilation- or erosion-type PDE also occurs in a class 
of nonlinear time-dependent PDFs which was proposed in [13] to deblur images 
and/or enhance their contrast by generating shocks and hence sharpening edges. 
For ID images a special case of such a PDE is 

= -|Ua;|sign(M2;2,) (9) 

A major conceptual difference between the above edge-sharpening PDE and our 
PDE generating levelings is that in the former the switching is determined by 
the edges, i.e., the inflection points of u itself whereas in the latter the switching 
is controlled by comparing u against the reference signal /. Note also that, if at 
some point there is an edge in the leveling output, then there must exist an edge 
of equal or bigger size in the initial (reference) image. 



3.1 Discretization, Algorithm, Experiments 

To produce a shock-capturing and entropy-satisfying numerical method for solv- 
ing the general leveling PDE (8), we use ideas from the technology of solving 
PDFs corresponding to hyperbolic conservation laws [7] and Hamilton- Jacobi 
formulations [14]. Thus, we propose the following discretization sheme, which is 
an adaptation of a scheme proposed in [13] for solving (9). 

Let C/f be the approximation of u{x,t) on a grid (iAx,nAt)). Consider the 
forward and backward difference operators: 

T^x _ u{x + Ax,t) -u{x,t) _ u{x,t) -u{x - Ax,t) 

(Similarly we define the difference operators and along the y direction.) 
Then we approximate the leveling PDE (8) by the following nonlinear difference 
equation: 



un+l {S^)+ 

+{Sf)~ ^{{DIU1^)+Y + ((D- [/f)-)2 ] 



( 11 ) 
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where S'” = sign(/(iZ\x) — C/”), = max(0,r), and r~ = min(0,r). For sta- 

bility, (AtjAx) < 0.5 is required. Further, at each iteration we enforce the sign 
consistency 

sign(C/” - /) = sign(g - /) (12) 

We have not proved theoretically that the above iterated scheme converges 
when n — >■ oo, but through many experiments we have observed that it converges 
in a finite number of steps. Examples are shown in Fig. 1. 

4 PDE for 2D Leveling 

A straighforward extension of the leveling PDE from ID to 2D signals is to 
replace the ID dilation PDE with the PDE generating multiscale dilations by a 
disk. Then the 2D leveling PDE becomes: 

ut{x,y,t) = \\Wu{x,y,t)\\sign[f{x,y) - u{x,y,t)] . . 

u{x,y,0) =g{x,y) '' 

Of course, we could select any other PDE modeling dilations by shapes other 
than the disk, but the disk has the advantage of creating an isotropic growth. 

For discretization, let be the approximation of u{x,y,t) on a computa- 
tional grid (iAx,jAy,nAt). Then we approximate the leveling PDE (13) by the 
following 2D nonlinear difference equation: 

ur/^ = urj-At[--- 

( 5 ”^.) + ^(( D - [ 7”.)+)2 + (( D - C /;^.)-)2 + (( D ( iC //^.)+)2 + (( D ( tC /”.)-)2 
+( 5 ”^.)-^(( D -[/”.)+)2 + (( D - C / I ^.)-)2 -f (( D (([/” )+)2 + (( D ?([/”.)-)2 ] 

(14) 

where = sign{f{iAx,jAy) — U"j)- For stability, {AtfAx + AtfAy) < 0.5 is 
required. Also, the sign consistency (12) is enforced at each iteration. 

Three examples of the action of the above 2D algorithm are shown in Fig. 2. 

5 Discussion and Extensions 

5.1 PDEs for Levelings with Quasi-Flat Zones 

So far all the previous leveling PDEs produce filtering outputs that consist of 
portions of the original (reference) signal and of flat zones (plateaus). Actually 
they enlarge the flat zones of the reference signal. Is it possible to generate 
via PDEs generalized levelings that have quasi-flat zones? For example, zones 
with constant linear slope or zones with parabolic surface? The answer is yes. 
We illustrate it via the parabolic example. If we replace the flat dilation PDE 
generator in (8) with the PDE generator for multiscale dilations by a ID unit- 
scale parabola K{x) = —ax^ we obtain the PDE for ID parabolic levelings: 

ut{x,t) = ^\u^{x,t)\‘^sign[f{x) -u{x,t)] 
u{x,0) = g[x) 

To obtain the PDE for 2D parabolic levelings we replace \ux\ with ||Vm||. 



(15) 
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Fig. 1. Evolutions of ID leveling PDE for 3 different markers. For each row, the right 
column shows the reference signal / (dash line), the marker (thin solid line), and the 
leveling (thick solid line). The left column shows the marker and 5 of its evolutions at 
times t — n20At, n — 1,2, 3, 4, 5. In row (a,b) we see the general leveling evolutions for 
an arbitrary marker. In row (c,d) the marker was an erosion of / minus a constant, and 
hence the leveling is a reconstruction opening. In row (e,f) the marker was a dilation 
of / plus a constant, and hence the leveling is a reconstruction closing. {Ax — 0.001, 
At = 0.0005.) 
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Fig. 2. Evolutions of the 2D leveling PDE on the reference top image (a) using 3 
markers. Each column shows evolutions from the same marker. On second row the 
markers {t = 0) are shown, on third and fourth rows two evolutions aX t = lOAt and 
t — 20At, and on fifth row the final levelings (after convergence). Eor left column (b-e), 
the marker (b) was obtained from a 2D convolution of / with a Gaussian of cr = 4. For 
middle column (f-i), the marker (f) was a simple opening by a square of 9 x 9 pixels 
and hence the corresponding leveling (i) is a reconstruction opening. For right column 
(j-m), the marker (j) was a simple closing by a square of 9 x 9 pixels and hence the 
corresponding leveling (m) is a reconstruction closing. {Ax = Ay — 1, At = 0.25.) 
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(c) (d) 

Fig. 3. ID Multiscale levelings, (a) Original (reference) signal / (dash line) and 3 mark- 
ers Qi obtained by convolving / with Gaussians of standard deviations Oi — 30, 40, 50. 
(b)-(d) show reference signals gi (dash line), markers pi+i (dotted line), and levelings 
<F((;i, Qi+i) (solid line) for i = 0, 1, 2, where go = /. {Ax = 0.001, At = 0.0005.) 
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Original Reference 




Marker 3 Leveling 3 

Fig. 4. Multiscale image levelings. The markers were obtained by convolving reference 
image with 2D Gaussians of standard deviations cr = 3, 5, 7. {Ax = Ay — 1, At = 0.25.) 
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5.2 Why Use PDEs For Levelings? 

In addition to the well-known advantages of the PDF approach (such as more in- 
sightful mathematical modeling, more connections with physics, better isotropy, 
better approximation of Euclidean geometry, and subpixel accuracy), during con- 
struction of levelings or reconstruction filters it is possible in some applications 
to need to stop the marker growth before convergence. In such cases, the isotropy 
of the partially grown marker offered by the PDE is an advantage. Further, there 
are no simple digital algorithms for constructing levelings with quasi-flat zones, 
whereas for the PDE approach only a simple change of the generator is needed. 



5.3 From Gaussian Scale-Space to Multiscale Levelings 

Consider a reference signal / and a leveling If we can produce various markers 
gi, i = 1,2,3,..., that are related to some increasing scale parameter i and 
produce the levelings of / with respect to these markers, then we can generate 
multiscale levelings in some approximate sense. This scenario will be endowed 
with an important property if we slightly change it to the following hierarchy: 

hi = 'h'{f,9i), h2 = <I'{hi,g2), hs = (/ 12 , ffs), (16) 

The above sequence of steps insures that hj is a leveling of hi for j > i. 

The sequence of markers gi may be obtained from / in any meaningful way. 
In this paper we consider the case where the gi are multiscale convolutions of / 
with Gaussians of increasing standard deviations Examples of constructing 
multiscale levelings from Gaussian convolution markers according to (16) are 
shown in Fig. 3 for a ID signal and in Fig. 4 for an image /. The sequence of 
the multiscale markers can be viewed as a scale-sampled Gaussian scale-space. 
As shown in the experiments, the image edges and boundaries which have been 
blurred and shifted by the Gaussian scale-space are better preserved across scales 
by the multiscale levelings that use the Gaussian convolutions as markers. 
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Abstract. We present an extension of the scale space idea to surfaces, 
with the aim of extending ideas like Gaussian derivatives to function 
on curved spaces. This is done by using the fact, also valid for normal 
images, that among the continuous range of scales at which one can 
look at an image, or surface, there is a infinite discrete subset which 
has a natural geometric interpretation. We call them “proper scales” , as 
they are defined by eigenvalues of an elliptic partial differential operator 
associated with the image, or shape. The computations are performed 
using the Finite Element technique. 



1 Introduction 

Scale space theory studies the dependence of image structure on the level of 
resolution [8]. Most of the time, an observer is interested in the object imaged 
and not purely the complete image. Such an object generically has features at 
very different scales. 

On the other hand, not all possible values of scale show interesting features, 
for example scale lengths smaller than the resolution, and at the other extreme, 
lengths greater than the “window” of the image. This also depends on the shape 
of the window, if it is square or cubic, circular, etc... 

The imaging process has an influence on the extracted object, which is only 
an approximation of the imaged one, and thus measurements on the object are 
also approximations. The degree of approximation is dependent on the scale, in 
a broad sense, at which the measurement is made. 

These reasons show a need for the definition of the “right scales” [8] associated 
with an object. Such a scale space theory for objects should show properties 
similar to the ones of the image scale space. The luminance function defining an 
image is a measurement of a given part of the physical world, and it is embedded 
in a family of derived images. In the same way, a measurement on a shape should 
have a corresponding family of derived measurements at a specified resolution. 

* We would like to thank the referees for the useful suggestions 
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Our claim is that there is a list of “proper” scales, where by proper we mean 
that they are specific to the shape. 

We present the necessary theory for calculating a scale space on curved man- 
ifolds, analogous to the use of Gaussians on standard images. We then describe 
how to implement this theory on discrete data using Finite Element methods. 
Finally, we give example results from simple geometrical shapes. 



2 Theory 

We start by stating some notations and definitions. An image / is a mapping 
/ : f? — >■ IR", where 17 C M™. In concrete cases, m is 2 or 3, n = 1, 17 is an open 
bounded set with piecewise smooth (C°°) boundary, e.g. a square or a cube. For 
discrete images, one should take the intersection of such a domain with a lattice. 
To be very precise, we should also specify to which class of functions / belongs, 
for the moment we will just call iJ(17) the set of possible images on 17. 

The image I is extended to a family of images parametrised by S, i.e. Is ■ 
17 X S' — >■ M". Isi'f) =■ h(")) the central point is that It is defined by 
convolution with a parameter dependent kernel 

/C^:17xl7xS^lR 

(x, y, t) /Cf (x, y) . 



We write f = I * K.^ . 

The choice of the kernel is central to this theory. Normally, S = IR'’" and the 
Gaussian kernel is chosen: 



= Gt{x-y) 



1 -Il(a:-y)||^ 

— e 2t2 

(27rf2)»n/2 



( 1 ) 



There are many approaches which lead to this choice, for an historical account, 
see e.g. [10]. The mathematical foundations for the following can be found in [6] 
or for a more introductory text [9]. 

We are interested in shapes, usually extracted from an image by segmenta- 
tion. We will denote a shape with the letter A, and define it as being a 2D- 
surface, or if one wants other dimensions, an m dimensional manifold. 

Examples would be a sphere, or more interestingly, the cortical surface of a 
brain extracted by segmentation. 



Scale Space on Shapes. The main difference with the case described above is 
that X has usually no linear structure: one can’t add x G X to y G X. The 
Gaussian kernel is meaningless in this case, as well as the axioms based on 
linearity. Nevertheless, there is one approach which can be used straightforwardly 
here: lijima’s axiomatic of 1971 ([10]). Given an image / : A — >■ IR, let f be the 
scale space of A which we want to define. F is the corresponding image flow, i.e. 
it is a vector field on the manifold A which gives the strength and direction of 
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the “intensity” change at time t. The first principle is the Conservation principle, 

i.e. 

+ dW^F = 0. (2) 

at 

The second condition is what is called Principle of maximum loss of figure im- 
pression, which is translated by saying that the flux direction should be the 
direction of maximal image gradient, i.e. F = It- Using this in equation 2, 
we have: 

^It = A^It on X (3) 

at 

Io = I (4) 

The corresponding kernel is the special solution such that: 

^ICf{x,y,t) = A^ICf{x,y,t) on X 
lim [ JC^{x,y,t)I{y) = I{x) 

For a standard image, the kernel /C^ is a Gaussian, and the Laplacian takes its 
usual form. 

Deriving the Form of the Scale Space Kernel and the Laplacian for a Manifold. 
First, the degree of differentiability of the manifolds X have not been specified. 
The most general assumptions do not require it to be differentiable, and indeed 
fractal domains are a current area of research in mathematics for the existence 
of solution to such heat equations. Here, it is simply assumed that X is at least 
a manifold. They can have a boundary, with finitely many corners out of 
which the boundary is also smooth. At none of these points should there be a 
zero angle. Another very important concrete assumption is that X is compact, 
i.e. in particular bounded. The space H{X) of possible “images” should be such 
that the partial differential equation above can be defined within that space, so 
one is first tempted to take C^(A) or even C°°{X), but this choice often turns 
out to be too restrictive, and not well adapted to the numerical computations. 
The idea is to complete by limits of sequences of functions, where the limit 
is taken in the sense of the function norm: ||/||^i,2 = /x + /x 
to use partial integration to interpret the diffusion equation above in the weak 
form. The space is called the Sobolev space iJ^(A), where the index refers to the 
fact that we only use one degree of differentiability, as the differential equations 
are taken in the weak sense. Fl^{X) contains non differentiable functions, for 
example Lipschitz continuous functions belong to it. 

We are interested in X which is a curved space, or Riemannian manifold 
and this allows us to define the differential operators div^, to be defined 
on differentiable vector fields and functions over X^. From these, one can define 
the Laplace Beltrami operator A^ := div^V^ over C^. 

^ See [9] for expression in local coordinates 
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We can use these operators to construct a nonlinear scale space on a manifold 
X. Interestingly a special example of shape can be the graph surface defined by 
the points (x,y, I{x,y)) for x,y € 17. This approach appears in [7], see also [4], 
which contains an introduction to Riemannian geometry adapted to this field. 
Note that it is different to considering 17 as a flat surface with boundary, e.g. a 
rectangular window which corresponds to the classical scale space. There is still 
a need to construct the kernel explicitly. We will describe such a construction by 
introducing the main object of this work: the spectrum of the Laplace Beltrami 
operator, which will define the set of “natural scales” of the shape. 

It can be shown that the Laplace Beltrami operator can be extended to the 
complete space H{X) as defined above. It is a linear operator, and thus one can 
study the eigenvalues of such an operator. Note that a discrete spectrum is by no 
means guaranteed for a partial differential operator, but here we can state that 
there is a sequence of infinitely many eigenvalues, all of the same sign (which by 
convention is positive). Every eigenvalue has finite multiplicity, i.e. there exist 
only finitely many linearly independent eigenfunctions. These eigenfunctions are 
differentiable, and generate the functions. The two messages of this work are 
thus: 

1. There is an infinite but discrete set of natural scales (sn)nGiN associated with 
a shape. These scales are defined by the eigenvalues (A„)„g]N. Explicitly: 
s - ^ 

2. The heat kernel associated to this shape can be constructed from the eigen- 
functions. Explicitly, if is a normalised eigenfunction associated to A„, 

OO 

y,t) = '^ Un{x)un{y)e~^''* (5) 

n— 0 

Example 1. On a flat square with either free boundary condition (Neumann 
problem), or fixed boundary condition (Dirichlet problem), the eigenfunctions 
are the usual trigonometric functions, and the expansion of a function according 
to the eigenfunctions is just the Fourier series of the function. 

Example 2. On a sphere (radius 1), the metric tensor in polar coordinates is 

(\ 0 A 

0 sin^ Q J 

thus the Laplace Beltrami operator is 

Lisphere = (^9^ (sin 6»a^) -k 

sint^ \ smt^ 

the eigenvalue equation is separable, the equation in 6 transforms into a Legendre 
equation with the change variable t := cos^ 9. The equation in <f> becomes a simple 
harmonic equation, the eigenfunctions are the spherical harmonics Ym,n(9,4>) 
with associated eigenvalues A„ = n(n-k 1). This means in particular that A„ has 
high multiplicity 2n -k 1, a sign of the high symmetry of the sphere, cf. [1,2,3]. 
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Interpretation. The eigenvalues of the Laplace Beltrami operator have units 
1/m^, and can be interpreted as wavelengths for small variation of the shape. 
Intuitively, a large scale corresponds to information on global aspect of the shape, 
whereas on a small scale, the assumption of the shapes being locally like Eu- 
clidean space means that the behaviour should be the same as the one expected 
on flat surface patches. This is indeed the case, as stated by Weyl’s asymptotic 
formula ([1,2, 3, 6, 9]): 

2 

— ^ ^ I up to a multiplicative constant 

Area{X) J 

The theory of spectral geometry is exposed among others in [1,2, 3, 6, 9]. One has 
the following development for any scale space: 



h{x) = '^e 

n— 0 

Now, the first eigenvalue (written Aq) is zero, so the scale space “image” can 
be interpreted as a sum of weighted “proper” scale images, where the weights are 
exponential in the corresponding scale. This explains the asymptotic behaviour: 
as t becomes large, only the “proper” mode Uq, corresponding to infinite scale 
So oo remains: this gives a constant eigenfunction: It ^ I the mean value 
of I. All the “grey values” are spread over the complete domain. At the other 
limit, (t — f 0), one gets the initial condition. 

Use of the Finite Element Method to Compute Eigenvalues, Eigenfunctions and 
the Scale Space Kernel. Up to now, we have supposed that everything was con- 
tinuous. Now, our typical application is a surface extracted from an image by 
segmentation. The image itself, instead of being a differentiable function on a 
domain becomes a function on a lattice. After segmentation-reconstruction, one 
typically gets a discrete version of the surface. As standard reconstruction tech- 
niques usually give triangular facets, we will use triangulated surfaces in the 
examples. The Laplace Beltrami operator is an elliptic partial differential opera- 
tor, and thus the Finite Element method is perfectly adapted to the computation 
of eigenvalues and eigenfunctions. For this, the eigenvalue equation is written in 
weak form: 

(Vm, Vv) = \{u, v) 

theoretically for all v in the solution space. Here, we choose a Finite Ele- 
ment space Sx{T) generated by piecewise linear functions Nt, where i runs 
over the vertices of a triangulation T of A, G C°(A), the global condition, 
Aj(vertexj) = <5^ . We seek approximate solutions in Sx{T): ut = The 

equation has thus to be valid for all v = Nj, i.e.: 

= XY,U\u,,u,). 

i i 
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Writing Kij := {VNi,VNj)s, we get the global stiffness matrix and Mij := 
{ui, Uj)s gives the global mass matrix, so the problem becomes the one of finding 
the generalised eigenvalues of the matrices K and M: 

KU = XMU. 



These are assembled from their values on individual facets, 

= i^mn,VN,\T,)dA 

k 

and 

^h = E/ mn,N,\T,)dA 

7, Tu 



Each triangle Tk is the affine image of a standard fixed triangle Tq (isopara- 
metric FE). Let us call this affine map Fk : ^ x. Dropping the index k, and 
using ^ dA{x) = || dA{$,), we need to compute for every facet the 

integrals: 



If(To) 



{X/N,{x),VNj{x))dA{x) = 



I To 



(J-1 (F(4)) VN, (F(4)) , J-i (E(0) VN, (Fim) |./(OI dA{^) 



where we have written J(^) := ^(C)- For the simple choice of linear elements 
that we have made here, there is a big simplification: the derivative of an affine 
map is just the linear part of this map, and thus is constant, and the Jaco- 
bian determinant is the ratio of areas of the triangles. This allows high speed 
computations for the matrix assembly. 



3 Example 

Figure 3 shows the computation of the 50 first eigenvalues on a sphere (*), plotted 
alongside the 50 first for a discrete sphere (-I-) and a “voxelised” reconstructed 
sphere (o) (shown below). This sphere was constructed by filling voxels at a given 
distance from a centre, which allows to artificially specify the resolution, then 
using standard reconstruction algorithm (here Nuages, [5]) to get a triangulated 
shape, the other one has been directly constructed in a more classical manner 
by using a triangulation of the parameter domain. Values have been scaled by 
area to make them comparable. 

This example was designed to illustrate a typical problem from surface re- 
construction: if the surfaces look similar from far, the reconstructed sphere has 
a very different small scale structure. 
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Fig. 1. x-axis: index n of the eigenvalue, y-axis: value of An, in 1/m^. The area is 
normalised to approximately 4tt. dark real sphere 




Fig. 2. Left, the shape on the left looks like a sphere on a “large” scale (radius 1), but 
at a scale of the size of a voxel, it is very different, see also 3 
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Fig. 3. At an other scale: zooming on a detail shows the difference between the objects 



Their corresponding spectra reflect this fact: the one for the voxelised sphere 
is shifted upwards, which we interpret as showing that the small scale structure 
of the sphere has a strong influence. The reconstruction was done with the help 
of the Nuages software. This can be shown by computing local measures, for 
example curvatures, see Fig. 5 for a simulated example. 

The next figure (Fig. 2) shows the “diffusion” on such a sphere: on the left 
the initial situation. A random function I has been defined on the set of vertices, 
with value 0 or 1. If the value at vertex i is larger than 0.5, the facets which 
contain vertex i are coloured red, it it is between 0.1 and 0.5, they are coloured 
light red. We compute the eigenvalues and eigenfunctions for this sphere, and 
then It := ^PPly the same colouring rule. Even for 

this rough example, one can see how most of the gaps were filled. 

A more concrete example is given in Fig. 5. The first torus on the left has 
450 facets, and its Gauss curvature is displayed. The settings are such that 
the curvature range is approximately -0.13 to 0.15. The second torus shows this 
curvature which has been perturbed by addition of a random value (generated by 
Matlab) within the range 0 to 0.1. The last one shows this perturbed curvature 
after diffusion, at time t = 0.1. The aim is to illustrate the blurring property of 
this technique, similarly to the usual Gauss kernel: the dark blue and dark red 
have already disappeared ^ . 



^ The figures and some supplementary material will be made available at 
http:/ /carmen. umds.ac.uk/p.batchelor 
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Fig. 4. top: the dark red facets simulate some initial data, bottom: this initial data 
has been diffused, resulting in a blurred, more homogeneous pattern 
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Fig. 5. Left : the Gauss curvature, right: the initial scalar on the surface is the Gauss 
curvature randomly perturbed, below finally the diffused value {t = 0.1 cf text), flat 
shading was chosen in order to avoid any confusion 



4 Discusssion 

The main contribution of this work is the extension of the scale space idea to 
curved spaces. It is concretely done by using techniques of elliptic partial dif- 
ferential equations, solved with the Finite Element technique. These techniques 
are well understood and commonly applied in a variety of domains ranging from 
fluid mechanics to elasticity theory. This extension of the concept of scale space 
will allow a wider variety of real world problems to be addressed within a math- 
ematically rigorous, yet computationally viable, framework. 

Applications may include the smoothing of surface shape measures made on 
discretised surfaces, for example in the brain mapping problem. They also permit 
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a natural, observer independent, definition of the scale of objects of interest in 
such problems, for example sulci and gyri. 

Future work Work in progress include the application of these techniques to a 
variety of other geometrical shapes. Future work will include the characterisation 
of brain surface features from both normal and abnormal human brains within 
a multi scale paradigm. 
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Abstract. Electron tomography is a powerful tool for investigating the three- 
dimensional (3D) structure of biological objects at a resolution in the nanometer 
range. However, visualization and interpretation of the resulting volumetric 
data is a very difficult task due to the extremely low signal to noise ratio 
(<OdB). In this paper, an approach for noise reduction in volumetric data is 
presented, based on nonlinear anisotropic diffusion, using a hybrid of the edge 
enhancing and the coherence enhancing techniques. When applied to both, 
artificial or real data sets, the method turns out to be superior to conventional 
filters. In order to assess noise reduction and structure preservation 
experimentally, resolution tests commonly used in structure analysis are applied 
to the data in the frequency domain. 



1 Introduction 

Transmission electron microscopy is used to investigate the structural organization 
of biological objects (e.g. macromolecular assemblies or cellular organelles) at a 
resolution in the nanometer range. In good approximation, the obtained two- 
dimensional (2D) images are parallel projections of the three-dimensional (3D) 
density distribution of the object. By means of techniques similar to medical computer 
tomography, it is possible to reconstruct the 3D density of the specimen and to reveal 
the 3D structure of the biological object [4]. Although electron microscopes are able 
to image biological objects with a resolution down to 0.3nm, the structural 
information is not directly accessible since most of the signal is buried in noise 
(SNR<OdB). The standard method in the field is correlation averaging, where many 
thousand identical particles are averaged in order to reveal structural information with 
a resolution down to 0.8nm. In the case of unique objects (e.g. cells) averaging is not 
possible and denoising exigently necessary. Particularly with regard to the three- 
dimensionality of the observed objects, denoising plays an essential role, since the 
human eye is not able to extract the same amount of information (by interpolation, 
lowpass filtering, classification, etc.) as in the 2D case. An interpretation of the 
volumes using surface and volume rendering techniques is difficult due to the noise 
sensitivity of rendering algorithms. A denoising algorithm suitable for such 
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applications must be able to preserve as much signal as possible while reducing the 
noise to a sufficiently low level. Nonlinear anisotropic diffusion appears to be a good 
basis for such an algorithm, as demonstrated by test calculations and applications 
presented below. 



2 3D electron tomographic reconstruction 

The 3D electron tomographic reconstruction is a method similar to the well known 
tomographic reconstruction in medical imaging (X-ray tomography, etc.). All these 
methods are based theoretically on the “central section theorem”, stating: The Fourier 
transform of the 2D-projection image corresponds to the central section through the 
3D Fourier transform of the object, which is perpendicular to the projection direction 
[8]. This theorem can be used to perform a 3D reconstruction: The 2D Fourier 
transforms of the projections are derived and placed in the 3D Fourier domain, 
according to the corresponding angle. After interpolation and 3D inverse Fourier 
transformation, the reconstructed object appears in real space. In practice, a different 
algorithm, namely filtered backprojection is mainly used for reconstruction in electron 
tomography due to its simple and general applicability [3] 

The typical experimental approach in electron tomography is to tilt the specimen in 
the microscope about an axis perpendicular to the electron beam and to record an 
image for each tilt view. Unfortunately, the specimen cannot be tilted over the full 
angular range from -90 to +90 degrees, because the specimen holder masks the object 
at high tilt angles. Additionally, the total electron dose has to be kept below a critical 
limit in order to avoid excessive radiation damage. Therefore the number of 
projection views has to be limited and the images suffer from an extremely low 
signal-to-noise ratio. Image shifts resulting from mechanical inaccuracies of the tilt 
stage and from specimen drift require an alignment of the projection images with 
respect to a common origin, a process also prone to errors. As a consequence of all 
these effects, the interpretation of volumetric data obtained by electron tomography is 
severely aggravated by artifacts and a noisy appearance. Of all the artifacts, those 
arising from the limited tilt range are easy to understand in the Fourier domain 
(“missing wedge”) [4] and, in real space, may be described by a point spread function 
expressing an anisotropic resolution. Any approach for noise reduction must not 
amplify artifacts. 



3 Anisotropic diffusion 

The idea introduced in the pioneering work of Perona and Malik [7] is to prefer 
intraregional smoothing and, consequently, to preserve semantically important 
features as edges. Many methods have been proposed how to control diffusion in 
order to achieve the best signal preservation. In the implementation described below 
different nonlinear anisotropic diffusion methods have been combined, realized in 3D, 
and accommodated to the filtering of electron tomographic reconstructions [2]. 
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The nonlinear anisotropic diffusion procedure applied to a 2D image or 3D volume 
I can be described in a general form by the following equation: 

|- = div(g(V/)V/), 

where t denotes the evolution time and V/ the local gradient. In the following we 
concentrate on the 3D case. The diffusivity g is either a scalar or a matrix function. 
The crucial question is how the diffusivity has to be designed in order to achieve 
maximum noise reduction and optimum signal preservation. Following the setup 
proposed by Weickert [15], the diffusivity is advantageously derived from the 
structure tensor Jo averaged by convolution with the Gaussian Kp : 

Jp(V/)=^^p*Jo with J„=V/ V/\ (2) 



Local structural features of I within a neighborhood of size 0{p )ae characterized by 
the local eigenvectors and eigenvalues of the matrix Jp .Generally, the eigenvalues 
describe the variance of the volume data in the direction of the corresponding 
eigenvectors. In the presence of noise and for p Tl, all eigenvalues p- are positive 
since the matrix is positive semidefinite. 

>0 (3) 

The first eigenvector is parallel to the average gradient orientation and the 
corresponding eigenvalue reflects the strength of the local gradient, p^ provides 
further information about structural features, e.g. the existence of a surface or a line 
and /ij can be used as a measure for the noise level. 

For 2D applications, Weickert has proposed to use two different realizations of the 
diffusion tensor depending on what structural features should be emphasized [13, 14, 
15]. The first one - called edge enhancing diffusion (BED) - is basically a correctly 
discretized Perona-Malik model and shows a good performance at a low signal-to- 
noise ratio. Edges are evenly enhanced and piecewise constant subvolumes are 
produced in between. The second method is called coherence enhancing diffusion 
(CED). It averages the gradient over a rather large field of the volume and calculates 
the mean orientation. It is capable of connecting lines interrupted by noise. 

In order to cover the larger variety of structural features in 3D as well as the high 
noise level, one can take advantage of both methods by combining them according to 
the following strategy. The difference-value between the first and third eigenvalue 
reflects the local relation of structure and noise, therefore it can be used as a switch: 
BED is applied when this value is smaller than a suitably chosen threshold parameter 
and CED otherwise. A useful threshold parameter can be derived ad hoc from the 
variance, calculated over a subvolume of I that only contains noise. It is possible to 
verify the appropriate choice of the subvolume by lowpass filtering or even by visual 
control of the backprojected images. 

It is obvious that during the first iterations BED highlights the edges while, 
subsequently, CED connects the lines and enhances flow-like structures. Both 
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processes take place simultaneously within one iteration step, depending on the local 
threshold parameter. If the build-up of a specific edge takes more iterations, the other 
edges are going to be preserved and enhanced, so that no signal degradation takes 
place. 

For the discretization of the model central differences are utilized. The additive 
operator splitting (AOS) [12] schemes definitely showed a superior performance to 
the standard iterative methods, but in order to switch between the different diffusion 
types, the simulations were performed with the simple explicit Euler method. In order 
to preserve small structures, with a size of only a few pixels, the gradient 
approximation, proposed by Sethian [10] is used. It shows a better performance than 
the standard central-difference gradient approximation. 



4 Applications 

In this chapter two impressive examples for the applicability of the hybrid model 
for 3D visualization in the field of electron microscopy are given. In the first example 
the object under scruting is a vesicle with actin filaments [1]. The size of the object is 
lOOnm in diameter and the resolution is ca. 7nm. The position, connectivity and 
strength of the filaments, pictured as dark lines in Fig. 1, or as thin white fibers 
running parallel to the direction of the cylinder in Fig. 2, are the features of interest. 
The quality of representation of these features can also be used as a criterion for the 
judgement of the performance of the respective method. For this volume a 
comparison with standard techniques in image processing is presented. The second 
example is one of the first 3D reconstructions of a mitochondrion in vitrified ice [5]. 
Strength and connectivity of the white fibers are here again the criterion for the 
judgement. 

The vesicle is filtered with a simple lowpass and a median filter. In the case of low 
pass filtering the noise and the signal are simultaneously degraded. Though producing 
a satisfactory smooth background, the filaments are thickened and interrupted. The 
isosurface representation appears corrupted due to a lack of most of the information. 
Median filtering results in a good edge preservation, but the noise reduction is not 
satisfactory. Fig.l and Fig. 2 show the results of different types of filtering for the 
vesicle with actin filaments in tomographic and isosurface representation. The results 
after application of either EED or CED confirm the properties described in the 
previous section. EED produces the typical staircase effects and imposes an artificial 
appearance in the volumes. The connectivity of the filaments is not improved or even 
preserved. It shows basically a behavior opposite to CED. At last the result of the 
hybrid model is presented. It combines an excellent noise reduction of the background 
with a clear representation of the filaments 
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Fig. 2. Isosurface representation of a vesicle with actin filaments (Volume 256*256*128 
Voxels). The diameter of the cylinder is about lOOnm and the thickness of the white fibers 
7nm. The order of the representation is the same as in Fig. 1. 
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The 3D reconstruction of a mitochondrion also gives an impressive example of the 
applicability of the method (Fig. 3). The goal of this reconstruction was to investigate 
the 3D structure of the mitochondrion. The structures as obtained by electron 
tomography, cannot be visualized satisfactorily without filtering. The 3D diffusion 
filtering using the hybrid EED/CED approach drastically improves the connectivity 
and thus provides a clear picture of the complicated internal network of the object. 




Fig. 3. Isosurface representation of a mitochondrion (800*600*200nm). At the left side the 
original data. At the right side the result of denoising with the hybrid model. 



5 Assessment of signal preservation 



5.1 Correlation averaging 

In electron microscopy, efficient noise reduction of macromolecules is normally 
achieved by correlation averaging. Before averaging, the signals are brought into 
register using cross-correlation functions. The method combines the information 
contained in the images of many individual, however structurally identical molecules. 
Each volume is considered to be a single realization of a statistical process. It 
provides a signal corresponding to the structure of the object, e.g. a projection view or 
a density map, degraded by noise. Adding up equivalent signals of n volumes 

increases the signal-to-noise ratio by a factor 4n , thereby assuming additive, signal- 
independent noise. In the context of this approach it is possible to estimate the 
resolution of the averaged volume by comparing the averages of two statistically 
independent, equal-sized subsets of the corresponding ensemble. The comparison 
occurs by subdividing the Fourier domain into shells and calculating cross-correlation 
coefficients between the subsets for each of these shells. The resulting radial 
correlation function (RCF) is a frequency-dependent measure of similarity between 
the two subsets, and therefore can be used to estimate the resolution [9]. 
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Fig. 4. Results of correlation averaging of two ensembles represented by slices through the 
volumetric data. On the left side the average of the original particles (noisy ensemble). On the 
right side the average of the particles denoised by EED (denoised ensemble). 

Averaging is conceptually free from signal degradation, while all other denoising 
methods smooth the noise and more or less also the signal. In order to study how the 
signal is affected by nonlinear anisotropic diffusion, a set of real volumes of a 
biological macromolecule was subjected to denoising and averaging. The results were 
assessed in the frequency domain by means of the RCF. For this purpose, 500 copies 
were produced from the known 3D density map of the Thermosome molecule [6] and 
degraded by additive colored noise. Using EED, a denoised version was created from 
each individual copy. Finally, averaged volumes were calculated from both, the 
original “noisy” volumes (noisy ensemble) and the denoised versions (denoised 
ensemble). The results are presented in Fig. 4. The average of the denoised ensemble 
appears smoother and significant details are suppressed. Obviously, the signal is 
degraded by the diffusion process. 

In contrast to the apparent signal degradation, the cross-correlation coefficients of 
the denoised ensemble are higher than those of the noisy ensemble, indicating a 
higher resolution. This surprising result does not reflect a contradiction, because 
nonlinear anisotropic diffusion enhances the SNR and simultaneously reduces the 
magnitude of the Fourier coefficients. The statement may become clearer when linear 
diffusion is considered. In this case, the average volume is also blurred but the RCF is 
not changed at all. Since linear diffusion is equivalent to a linear filtration using a 
gaussian kernel, the data in the Fourier domain are damped by a factor which is 
constant within shells, and the cross-correlation coefficients used for the RCF remain 
unchanged. Obviously, the RCF-curves in Fig. 5 reflect the gain in the SNR when 
linear diffusion is replaced by the edge-enhancing approach. 
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Foiiricr Shell Corrclaiion Function 




Fig. 5. Fourier shell correlation function of the denoised and original particles. 



5.2 Frequency equalization 

Edge enhancing diffusion is a nonlinear process and cannot be described by a 
linear time invariant theory. Nevertheless, the improvement of the SNR described 
above gives some justification to improve the visual appearance of the average 
volume by a linear frequency enhancement. The global energy in the volume 
decreases with increasing evolution time when diffusion is applied (Lyapunov 
functional [13]). Due to Parseval’s theorem, the energy in the Fourier domain 
decreases correspondingly. The amount of this decrease can be determined as a 
function of frequency by investigating volume ensembles. As above, original and 
denoised volume data representing the Thermosome molecule are used to calculate the 
root mean square amplitudes on each shell in the Fourier domain. The curve in Fig. 6 
shows the ratio of mean amplitudes of the original and the denoised data and reveals a 
“bandstop” characteristic of edge-enhancing diffusion. 

Transfer Function 




Fig. 6. Ratio of the root mean square amplitudes in the Fourier domain. 

This function can be used for equalization in conventional manner. The result 
when equalizing the average of the denoised particle is shown in Fig. 7. The edges are 
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more distinct and the output looks similar to the average of the original particles. 
Furthermore, the noise enhancement is minimal. 
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Fig. 7. Equalization of the nonlinear diffusion process 



The idea arising from this observation is to determine a global “transfer function” 
and to equalize the data in the Fourier domain after the diffusion process. It is an open 
question whether or not such a function can be applied to all objects. We expect that 
the answer will be no, considering the non-linearity of the diffusion procedure and the 
diversity of objects studied by electron tomography. It is perhaps possible to define 
transfer functions for distinct classes of objects. In any case, further investigations are 
needed to clarify this point. 



6 Discussion 

An EED/CED hybrid model of nonlinear anisotropic diffusion techniques has been 
realized in 3D and adapted to the field of electron tomography. The examples 
presented in Figs 1-3 demonstrate a satisfactory performance especially in the case of 
very “noisy” data (SNR<-ldB). The smooth background indicates that an efficient 
noise reduction is achieved while the signal is well preserved. The diffusion approach 
turns out to be clearly superior to conventional methods of noise filtration, e.g. low- 
pass filtering or median filtering. Most important for electron tomography, the 
visualization of very complex volume data by isosurface representations or volume 
rendering is considerably improved, and the interpretation of the results from the 
biological point of view is facilitated. It is worth to note that the approach takes 
advantage from both, EED and CED, by avoiding artifacts arising from each of these 
methods. Connectivity and flow-like structures are preserved, while noise reduction 
and edge enhancement produce a significant SNR improvement. 

The design of the diffusion flux is more complicated in 3D than in 2D. An 
optimum setup should use the full structural information specified locally by all three 
eigenvalues of the averaged structure tensor. For instance, the second eigenvalue 
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can be used to switch between 2D and ID flux. However, tests based on this approach 
gave unsatisfactory results because /ij is very sensitive to noise. When a 2D flux is 
applied erroneously to a ID structure, artifacts may occur; e.g., a line may be 
degraded to a sword-like structure due to the Gaussian elongation in the additional 
direction. Further investigations are necessary to optimize the procedure. This is also 
true for another drawback of the method, namely the discretization stencil. In the 
present implementation, central differences are used for the model discretization and 
consequently, signals belonging to frequencies near the Nyquist frequency are totally 
eliminated. This may be improved in a straightforward way by better gradient 
approximating methods. 

One might also ask, whether a 2D denoising of the projection images could replace 
the more complicated and time-consuming 3D denoising of the final tomographic 
reconstruction. However, the tomographic reconstruction process relies on a linear 
relationship between the projection images and the density values. Nonlinear 
diffusion would destroy this relationship, possibly causing severe artifacts. According 
to previous experiences with another nonlinear denoising technique, the so-called 
wavelet denoising [11], such an approach cannot be recommended. 

A fascinating idea is to use the wavelet transformation in conjunction with 
nonlinear anisotropic diffusion. Obviously, the transformation could be applied in 
order to obtain more reliable information on local structures. Recently we have used 
the wavelet coefficients for estimating the diffusivity parameter. Preliminary results 
are very encouraging, apart from a slowing down of the process. An extended use of 
the wavelet transformation requires more detailed investigation. Unfortunately there 
is a lack of motivation to develop such an approach because higher dimensional 
applications of the wavelet transformation suffer from artifacts while more 
sophisticated translation- and rotation-invariant realizations require an intolerable 
effort in computer power. For electron tomography, the present setup of nonlinear 
anisotropic diffusion appears to be the most favorable approach regarding the 
efficiency of noise reduction, signal preservation and computing effort. 
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Abstract. We propose a simple approach to evolution of polygonal 
curves that is specially designed to fit discrete nature of curves in digi- 
tal images. It leads to simplification of shape complexity with no blur- 
ring (i.e., shape rounding) effects and no dislocation of relevant features. 
Moreover, in our approach the problem to determine the size of discrete 
steps for numerical implementations does not occur, since our evolution 
method leads in a natural way to a finite number of discrete evolution 
steps which are just the iterations of a basic procedure of vertex deletion. 



Keywords: discrete curve evolution, shape simplification, shape recognition 



1 Introduction 

We assume that a closed polygon P is given (that does not need to be simple) . In 
particular, any boundary curve in a digital image can be regarded as a polygon 
without loss of information, with possibly a large number of vertices. 

The main motivation for the presented discrete curve evolution is the fact that 
the boundary of a segmented object in a digital image contains misinformation 
but misses no information. Clearly, there is digitization and segmentation noise 
on the boundary of a segmented object, that results in displacement of the 
boundary points. However, as long as it is possible to recognize the overall shape 
of the object, the shape information is contained in the given contour. 

Most of the standard approaches in computer vision try to compute the 
original position of the displaced boundary points. This is only possible if the 
class of shapes to which the analyzed shape belongs is explicitly known and is 
sufficiently restrictive, e.g., fitting ellipses. 

On the other hand, it is not necessary to recover the original position of the 
boundary points in order to recognize the shape. A pointwise interpretation of 
this fact is that there exists a subset A of the set of the boundary points B 
that is sufficient to represent the shape of the object. The other points in B\A 
either are redundant for the shape or had been influenced by noise. Clearly, the 
points in the set A may also be displaced due to noise, but nevertheless they 
are sufficient to recognize the shape, if the amount of displacement is such that 
people can still recognize the shape. For example, this is the case for the contour 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 398—409, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




Polygon Evolution by Vertex Deletion 399 



of the building obtained from an aerial image in Figure 1 (cf. Brunn et al. [3], 
Fig. 4), where it is still possible to recognize the overall shape, although the 
amount of displacement of boundary points is relatively large. 




Fig. 1. It is possible to recognize the overall shape of the building, although the amount 
of displacement of boundary points is relatively large. 



The presented discrete curve evolution allows us for a given object bound- 
ary to find a subset A of the set of the boundary points B that is sufficient to 
represent the shape of the object, i.e., points important for the object shape 
remain after the application of the discrete curve evolution. For example, com- 
pare the contour (a) with (c) in Figure 2, where the contours (b) and (c) are 
obtained from (a) by our discrete curve evolution. Observe also an enormous 
data reduction: contour (c) in Figure 2 contains only 3% of points of contour 
(a). 

The fact that the discrete curve evolution allows us to find a subset A of 
the set of the boundary points B that is sufficient to represent the shape of the 
object is not only justified by experimental results, some of which we present in 
this paper, but also by the continuity theorem in [7]. This theorem states that 
if polygon B is sufficiently close to a polygon A, then the evolved version of 
polygon B will remain close to polygon A. 

In scale-space theory a curve (or surface) T is embedded into a continuous 
family {T) : t > 0} of gradually simplified versions. The main idea of scale- 
spaces is that the original curve (or surface) B = Bq should get more and more 
simplified and noise and small structures should vanish as parameter t increases. 
Thus, due to different scales (values of t), it is possible to separate small details 
from relevant shape properties. The ordered sequence {Bf : t > 0} is referred 
to as evolution of B. Scale-spaces find wide application in computer vision, in 
particular, due to smoothing (=^> noise influence is reduced) and elimination of 
small details (=J> relevant shape features remain). Some of the main applications 
are quality enhancement of images, noise removal, and shape description and 
recognition (e.g., see Sethian [12]). 
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Fig. 2. (a) — >■ (b): noise elimination, (b) — >■ (c); extraction of relevant line segments. 



The scale-space evolution is mostly based on parabolic partial differential 
equations. The oldest and best-studied are scale-spaces based on a linear diffu- 
sion equation (also called geometric heat equation), e.g., see Weickert [15]. The 
solutions of diffusion equations can be obtained by convolution of the original 
curve (or surface) with a Gaussian function with parameter t (Kimia and Siddiqi 
[5]). Hence the solutions correspond to Gaussian smoothing of the original curve 
(or surfaces) with support size t. This leads to a multiscale, curvature-based 
shape representation. 

Along with the advantages of evolution based on the linear diffusion equation, 
there are also some serious problems (Weickert [15], p. 6): 

(a) “Gaussian smoothing does not only reduce noise, but also blurs important 
features such as edges and, thus, makes them harder to identify. Since Gaus- 
sian scale-space is designed to be completely uncommitted, it cannot take 
into account any a-priori information on structures which are worth being 
preserved (or even enhanced). 

(b) Diffusion dislocates features when moving from finer to coarser scales. So 
features identified at a coarse scale do not give the right location and have 
to be traced back to the original image [16]. In practice, relating dislocated 
information obtained at different scales is difficult and bifurcations may give 
rise to instabilities. These coarse-to-fine tracking difficulties are generally 
denoted as the correspondence problem. ” 

To reduce these problems, many anisotropic and nonlinear diffusion processes 
have been proposed for scale-spaces (for an overview see, Weickert [15]). Also 
reaction-diffusion equations, which lead to reaction-diffusion scale spaces, have 
been considered (Kimia, et al [6]). 

We propose a different approach to scale-space evolution in which both prob- 
lems simply do not occur. Our departing point is a discrete nature of curves and 
surfaces in digital images. In opposite to standard approaches in scale-spaces, 
our evolution is guided neither by differential equations nor Gaussian smoothing, 
and it is not a discrete version of an evolution by differential equations, as it is 
the case in Bruckstein, et al. [2]. The main properties of the proposed evolution 
are (see Figure 3): 
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• Although it leads to noise elimination, it does not introduce any blurring 
effects. 

• Although irrelevant features vanish during our evolution, there is no dislo- 
cation of relevant features. 









Fig. 3. A few stages of our curve evolution. The first contour is a distorted version of 
the contour on www-site [17]. 



In comparison to scale-space methods, the main differences are 

1. By numerical implementations of diffusion equations, every vertex of the 
polygon is translated at a single evolution step, whereas in our approach the 
remaining vertices do not change their positions. 

2. The translation vector of each point in a diffusion process is locally deter- 
mined, whereas our polygonal evolution is guided by a relevance measure 
that is not a local property with respect to the original polygon. 

3. The process of the polygonal evolution is parameter-free. 

Although there exist diffusion process that are parameter-free in the sense 
that constant values for parameters are known that apply to large classes of 
curves, for most numerical implementations of parabolic differential equations 
several parameters are necessary and it is theoretically unknown how to relate 
and determine the parameters. This is due to 

(c) problems with stability and computation time of discrete, numeric realiza- 
tions of diffusion processes. 

An example problem is to specify the discrete time steps t necessary for a stable 
numeric computation. Since the scale-space theories are continuous theories, i.e., 
scale (or time) parameter t varies over positive real numbers, the determination 
of discrete steps is a non-trivial problem; if the steps are too large, it can happen 
that too many relevant features vanish, and on the other hand, too small discrete 
steps lead to an inefficient computation. Additionally, a given digital curve (or 
surface) has some fixed grid resolution that cannot be made infinitely small. 
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and this resolution not always satisfies the requirements for stabile numerical 
solutions of partial differential equations. A different but related problem is the 
following: 

(d) “Diffusion filters with a constant steady-state require to specify a stopping 
time if one wants to get nontrivial results.” (Weickert [15], p.l9) 

Clearly, if the stopping time (i.e., stopping parameter t) is too large, it can 
happen that all relevant features do not any more exist at scale t. 

The proposed evolution method leads in a natural way to a finite number 
of discrete evolution steps which are just the iterations of a basic procedure of 
vertex removal. Thus, the problem to determine the size of discrete steps does 
not occur. This also drastically simplifies the problem of stopping time. 

2 Discrete Curve Evolution 

Let P be a closed polygon (that does not need to be simple). We will denote 
the vertices of P with Vertices{P). A discrete curve evolution produces a se- 
quence of polygons P = P°,...,P™ such that \Vertices{P"^)\ < 3, where | . | 
is the cardinality function. Each vertex v in P* is assigned a relevance mea- 
sure K{v,P'‘) G IR>o. The relevance measure K{v,P'‘) that we used for our 
experiments is defined below. The process of the discrete curve evolution is very 
simple: 

For every evolution step i = 0, ...,m — 1, a polygon P*+^ is obtained after the 
vertices whose relevance measure is minimal have been deleted from P*. 

In order to give a precise definition of the discrete curve evolution, we first 
define 

Definition: Kmin{P^) to be the smallest value of the relevance measures for 
vertices of PL 



Kmin{P^) = min{ AT(u,P*) : u G V ertices{P^)} 

and the set Vmin{P^) to contain the vertices whose relevance measure is minimal 
in PL 

VrmniP^) = {« G VerUc6s{P^) : K{u,P^) = KminiP^)} 
for i = 0, ..., m — 1. 

Definition: For a given polygon P and a relevance measure K, we call a 
discrete curve evolution a process that produces a sequence of polygons 
P = P ^, ..., P™ such that 

Vertices{P^+^) = Vertices{P^) \ F™„(P*), 

where \Vertices{P”^)\ < 3. 

The process of the discrete curve evolution is guaranteed to terminate, since 
in every evolution step, the number of vertices decreases by at least one. It is also 
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obvious that this evolution converges to a convex polygon, since the evolution 
will reach a state where there are exactly three, two, one, or no vertices in 
P™. Clearly, the only polygon with three vertices is a triangle. Of course, for 
many curves, a convex polygon with more then three vertices can be obtained 
in an earlier stage of the evolution. The only polygon with two vertices is a line 
segment. A polygon with one vertex is also trivially convex. Only when the set 
Vertices{P™) is empty, we obtain a degenerated polygon equal to the empty 
set, which is trivially convex. Thus, we obtain for every relevance measure 



Proposition 1. The discrete curve evolution converges to a convex polygon, 
i.e., there exists 0 < i < m such that P® is convex, and if 0 < i < m, all 
polygons P'‘~^^ P”^ are convex. ■ 



This proposition demonstrates mathematical simplicity of the relation be- 
tween our evolution approach and the geometric properties of the evolved poly- 
gons. Observe that this proposition also holds for polygons that are not sim- 
ple (i.e., have self-intersections). An analog theorem for evolution of continuous 
planar curves by diffusion equations is a deep and highly non-trivial result of 
differential geometry. It holds only for simple closed smooth curves evolved by 
the heat equation: 

Theorem (Grayson [4]) An embedded planar curve converges to a simple 
convex curve when evolving according to: 



dC(s,t) _ d^C(s,t) 
dt ~ ds^ 

C{s,0) = Co{s), 



k{s, t)N{s, t) 



( 1 ) 



where C : x [0,T) — >■ IR^ is a family of smooth simple curves, s is the 

Euclidean arc-length, k the Euclidean curvature, and N the inward unit normal. 
The diffusion equation (1) is called a geometric heat equation for a curve. The 
flow given by (1) is called the Euclidean shortening flow. 

Polygonal analogs of the evolution by diffusion equations are presented in 
Bruckstein, et al. [2]. The experiments in [2] indicate that an arbitrary initial 
polygon converges to a convex polygon (polygonal circle). However, the proof of 
this fact in the Euclidean case is an open question. In [2] as well as in evolu- 
tions by numerical solutions of differential equations, each vertex of the polygon 
with nonzero curvature is displaced at a single evolution step, whereas in our 
approach some vertices are removed and the remaining vertices do not change 
their positions. This is an important difference which leads to several proper- 
ties of our approach (described in the next section) that are favorable for many 
applications. 

The convexity result (and some other properties of the discrete curve evo- 
lution) holds for any relevance measure. However, there are some important 
properties like continuity that depend on the choice of the relevance measure 
(see Section 3). 

The key property of the evolution we used for our experiments is the order 
of the deletion determined by the relevance measure. Our relevance measure 
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K(v,P^) depends on vertex v and its two neighbor vertices u,w in i.e., 
K{v,P^) = K{v,u,w). It is given by the formula 

K{v,u,w) = K{P,li,l2) = , (2) 

<1 + h 

where (3 is the turn angle at vertex w in P*, li is the length of mi, and I 2 is the 
length of vw. (Both lengths are normalized with respect to the total length of 
polygon Pb) Intuitively it reflects the shape contribution of vertex w in Pb The 
main property is the following 

• The higher the value of K{v, u, w), the larger is the contribution of arc WulJvw 
to the shape of polygon Pb 

Observe that this relevance measure is not a local property with respect to 
the polygon P, although its computation is local in P* for every vertex v. A 
motivation for this measure and its properties are discussed in [8] . 

An algorithmic definition of the discrete curve evolution is given in [8] and 
live examples can be found our www-site [9]. The curve evolution in [8] differs 
from the one defined here if two or more vertices in P* have the same relevance 
measure. The evolution in [8] removes in a single step only one vertex. If in the 
course of the evolution no two vertices in P* have the same relevance measure, 
then the algorithmic definition in [8] and the above definitions are equivalent. 

3 Properties of the Discrete Curve Evolution 

We will show in this section that our discrete curve evolution has the following 
properties that do not depend on the choice of the relevance measure: 

(Pi) It leads to a simplification of shape complexity. 

(P2) It does not introduce any blurring (i.e., shape rounding) effects and 
(P3) there is no dislocation of relevant features, 

due to the fact that the remaining vertices do not change their positions. Two 
more important properties of our curve evolution are based on the relevance 
measure defined in Section 2: 

(P4) It is stable with respect to noisy deformations and noise elimination takes 
place in early stages of the evolution. 

(P5) It allows to find line segments in noisy images, due to the relevance order 
of the repeated process of linearization (e.g.. Figure 2). 

We begin with some examples to illustrate these properties. A few stages of 
the proposed curve evolution in Figure 3 illustrate the shape complexity reduc- 
tion. Observe that our curve evolution does not introduce any blurring effects, 
which result in shape rounding for curves, (for a comparison see the curve evo- 
lution on www-site [17], based on [10]). There is no dislocation of the remaining 
relevant shape features, since the planar position of the remaining points of the 
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Fig. 4. Discrete curve evolution is stable with respect to distortions. The same planar 
position of the points marked with the same symbols demonstrates that there is no 
displacement of the remaining feature points. 



digital polygon is unchanged. This is demonstrated by marking the correspond- 
ing points with the same symbols in Figure 4. Observe also the stability of feature 
points with respect to noise deformations shown in the second row in Figure 4. 

By comparison of the curves (a) and (b) in Figure 2, it can be seen that our 
evolution method allows us first to eliminate noise influence without changing 
the shape of objects {Pi). If we continue to evolve the curve (b), the deletion 
of vertices guided by our relevance measure results in a process of repeated lin- 
earization. This way the original line segments can be recovered in noisy images, 
see Figure 2(c) (cf. Brunn et al. [3], Fig. 4). 

Now we give a more formal justification of the above properties. The reduc- 
tion of shape complexity of a polygonal curve during the evolution process (Pi ) 
is justified by Proposition 1. Additionally, the shape complexity of a polygonal 
curve can be measured by the sum of the absolute values of the turn angles. 
Let C be a closed polygonal curve with vertices vq, Then the shape 

complexity of C is given by 



n— 1 

SC{C) = ^ \turn{vi)\, 

i^O 

where turn{vi) is the turn angle at vertex Vi in C. Clearly, the shape complexity 
of any closed convex curve is 27 t and the shape complexity of a closed non-convex 
curve is greater than 2tt. 

Proposition 2. The shape complexity SC{C) of a closed polygonal curve C is 
monotonically decreasing in the course of the discrete evolution, i.e., if C = 
C'0,...,C™ with |C""| < 3 is a sequence of simplified curves obtained by the 
evolution of C, then SC{C^) > SC{C^'^^) for 0 < k < m — 1. 



Proof: The curves and differ by at least one vertex, say Vd G 

Let Vd-i and Vd+i denote the neighbor vertices of Vd in C^, and let A be the 
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polygonal subarc of composed of the four digital line segments whose end- 
points are vertices Vd-\,Vd, Vd-i- If A is a convex arc, then SC{C^) = 5'C'(C^+^) 
(e.g, see Figure 5(a)). If A is not a convex arc, then SC{C^) > SC{C^'^^) (e.g, 
see cases (b), (c), and (d) in Figure 5). ■ 





Fig. 5. The shape complexity remains the same (a) or decreases (b), (c), and (d) after 
a single vertex has been deleted. 



The following proposition is a direct consequence of the definition of the 
evolution procedure: 

Proposition 3. Let C = C'°,...,C'" with \C™\ <?> be a sequence of simplified 
curves obtained by the discrete evolution. For every vertex v of digital polygonal 
curve C that also belongs to , the position of v on the plane as vertex of C is 
the same as the position of v as a vertex of C^. □ 

From Proposition 3, it clearly follows that there is no dislocation of the 
remaining features during the curve evolution. Thus, in our approach the corre- 
spondence problem of coarse-to-fine tracking difficulties does not occur. In con- 
trary, in the course of curve evolution guided by diffusion equations, all points 
with non-zero curvature change their positions during the evolution. Proposition 
3 also explains why our curve evolution does not introduce any blurring (i.e., 
rounding) effects: In a single evolution step, all vertices remain at their Euclidean 
positions with exception of the removed vertices. The two neighbor vertices of 
a removed vertex are joined by a new line segment, which does not lead to any 
rounding effects. 

We proved that the discrete curve evolution with the relevance measure 
K{v,u,w) is continuous (Theorem 1 in [7]): if polygon Q is close to polygon 
P, then the polygons obtained by their evolution are close. Continuity guaran- 
tees us the stability of the discrete curve evolution with respect to noise (P 4 ), 
which we observed in numerous experimental results. 

The fact that noise elimination takes place in early stages of the evolution 
is justified by the relative small values of the relevance measure for vertices 
resulting by noise: 

Mostly, if two adjacent line segments result from noise distortions, then when- 
ever their turn angle is relatively large, their length is very small, and whenever 
their length is relatively large, their turn angle is very small. This implies that 
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if arc vu U Ww results from noise distortions, the value K{v,u,w) of the rele- 
vance measure at vertex v will be relatively low with high probability. Hence 
noise elimination will take place in early stages of the evolution. This fact also 
contributes to the stability of our curve evolution with respect to distortions 
introduced by noise. 

The justification of property (P5) is based on the fact that the evolution 
of polygon Q corresponds to the evolution of polygon P if Q approximates P 
(Theorem 2 in [7]): If polygon Q is close to polygon P, then first all vertices of Q 
are deleted that are not close to any vertex of P, and then, whenever a vertex of 
P is deleted, then a vertex of Q that is close to it is deleted in the corresponding 
evolution step of Q. Therefore, the linear parts of the original polygon will be 
recovered during the discrete curve evolution. 

4 Topology-Preserving Discrete Evolutions 

Our discrete curve evolution yields results consistent with our visual perception 
even if the original polygonal curve P have self-intersections. However, it may 
introduce self-intersections even if the original curve were simple (e.g., see Figure 
6). Now we present a simple modification that does not introduce any self- 
intersections for a simple polygon P. 

We say that a vertex Vi G Vertices{P^) is blocked in P* if triangle Vi-iViVi+i 
contains a vertex of P* different from Vi-i,Vi, Vi+i. We will denote the set of all 
blocked vertices in P* by Blocked{P'^) . 

Definition: For a given polygon P and a relevance measure K, the process of 
the discrete curve evolution in which 

Kmin{P^) = min{Pi(t6, P*) : u G Vertices{P^) \ Blocked(P^)} 

and 

VrmniP") = {u G V ertices{P^) \ Blocked{P^) : K{u,P^) = iF™„(P*)} 

will be called a topology-preserving discrete curve evolution (e.g., see 
Figure 6). 

The question is whether this modified curve evolution will not prematurely 
terminate. This would be the case if Vertices{P'^) = Blocked{P'^). It can be 
shown that this is not the case, i.e., it holds for z = 0, ..., m — 1 

Vertices{P^) \ Blocked(P^) yf 0. 



5 Conclusions and Future Work 

We presented a discrete approach to curve evolution that is based on the obser- 
vation that in digital image processing and analysis, we deal only with digital 
curves that can be interpreted as polygonal curves without loss of information. 

The main properties of the proposed discrete evolution approach are the 
following: 
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Fig. 6. The discrete curve evolution may introduce self-intersections, but after a small 
modification it is guaranteed to be topology-preserving. 



(Pi) Analog to evolutions guided by diffusion equations, it leads to shape sim- 
plification but 

(P 2 ) no blurring (i.e., shape rounding) effects occur and 
(P 3 ) there is no dislocation of feature points. 

(P 4 ) It is stable with respect to noisy deformations. 

(P 5 ) It allows to find line segments in noisy images. 

These properties are not only justified by theoretical considerations but also by 
numerous experimental results. Additionally, the mathematical simplicity of the 
proposed evolution process makes various modifications very simple, e.g., by a 
simple modification, a set of chosen points can be kept fixed during the evolution. 

Our evolution method can be also interpreted as hierarchical approximation 
of the original curve by a polygonal curve whose vertices lie on the original curve. 
Our approximation is fine-to-coarse and it does not require any error parameters, 
in opposite to many standard approximations, where starting with some initial 
coarse approximation to a curve, whereupon line segments that do not satisfy an 
error criterion are split (e.g., Ramer [11]). A newer and more sophisticated split- 
and-merge method for polygon approximation is presented in Bengtsson and 
Eklundh [1], where multiscale contour approximation is obtained by varying an 
error parameter t, which defines a scale in a similar manner as it is the case 
for diffusion scale-spaces. This implies similar problems as for scale-spaces, e.g.. 
How to determine the step size for the parameter tl Additionally, the scale-space 
property of shape complexity simplification does not result automatically from 
the approach in [1], but is enforced ([1], p. 87): “New breakpoints, not appearing 
at finer scales, can occur but are then inserted also at finer levels. ” 

There are numerous application possibilities of our method for curve evo- 
lution in which scale-space representations play an important role, e.g., noise 
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elimination and quality enhancement, shape decomposition into visual parts, 
salience measure of visual parts, and detection of critical or dominant points 
(Teh and Chin [13], Ueda and Suzuki [14]). The specific properties of our curve 
evolution yield additional application possibilities like detection of straight line 
segments in noisy images, which can be used for model-based shape recovery 
(Brunn, et al. [3]), and polygonal approximation (cf. [1]). 

A paper on a discrete surface evolution that is analog to the presented polyg- 
onal evolution is in preparation. 
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Abstract. Multiresolution techniques are often used to shorten the ex- 
ecution times of dynamic programming based deformable contour op- 
timization methods by decreasing the image resolution. However, the 
speedup comes at the expense of contour optimality due to the loss of 
details and insufficient usage of the external energy in decreased res- 
olutions. In this paper, we present a new scale-space based technique 
for deformable contour optimization, which achieves faster optimization 
times and performs better than the current multiresolution methods. The 
technique employs a multiscale representation of the underlying images 
to analyze the behavior of the external energy of the deformable contour 
with respect to the change in the scale dimension. The result of this anal- 
ysis, which involves information theoretic comparisons between scales, is 
used in segmentation of the original images. Later, an exhaustive search 
on these segments is carried out by dynamic programming to optimize 
the contour energy. A novel gradient descent algorithm is employed to 
find optimal internal energy for large image segments, where the external 
energy remains constant due to segmentation. 

We present the results of our contour tracking experiments performed on 
medical images. We also demonstrate the efficiency and the performance 
of our system by quantitatively comparing the results with the multires- 
olution methods, which confirm the effectiveness and the accuracy of our 
method. 



1 Introduction 

A deformable contour [8] is an energy minimizing model which is popularly used 
for automatic extraction and tracking of image contours. One of the main rea- 
sons of the popularity of deformable contours is their ability to integrate image 
level bottom up information, task dependent top down knowledge information 

* This work is supported by Grant No. ROl DC01758 from NIH and Grant No. IRI 
961924 from NSF. 
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and the desirable contour properties into a single optimization process. A de- 
formable contour model has two types of energies associated with it: an internal 
energy, which characterizes the desirable attributes of the contour, and an ex- 
ternal energy, which ties the contour with the underlying image. The framework 
is based on minimizing the sum of these energies. Formally, a discretized version 
of a deformable contour is an ordered set of points V = [v\,V 2 , ...,««]. Given an 
image I, the energy associated with a deformable contour, G, can be generally 
written as 

n 

^ Snake iv) = Y. 

where Eint is the internal and E^^t is the external energy of the contour element 
Vi, and a and (3 are the weighting parameters. 

Although for the majority of the applications, the main framework for the 
formulation stayed more or less the same, there have been numerous proposals 
for minimization techniques. Original proposal[8] and many others used varia- 
tions of gradient descent algorithms for the minimization of Equation(l). While 
the internal energy definitions are suitable for optimizations based on gradient 
descent, external energies usually include large amounts of noise, which makes 
gradient descent methods sensitive to convergence to local minima instead of 
global minima, numerical instability and inaccuracy problems. 

Application of dynamic programming (DP) to deformable contour minimiza- 
tion [3] [5] addresses these problems. As we will explain in section 2.1, DP solves 
the optimality, numerical stability and incorporating hard constraints problems. 
However, although the time complexity is polynomial, DP suffers from long 
execution times. In order to shorten execution times for practical applications, 
researchers commonly suggested [5] using a multiresolution framework. The main 
idea of using multiresolution techniques for DP is to decrease the number of 
degrees of freedom for each contour element. Since the underlying images are 
smaller at the lower resolutions, there are less number of image positions that a 
contour element can take, resulting in faster exhaustive enumeration times. The 
details of current multiresolution techniques are explained in section 2.2. 

There are some problems with the above multiresolution techniques. First, 
during the construction of lower resolution levels, these techniques utilize the 
external energy of the deformable contour minimally. This is a serious prob- 
lem because only the external energy ties the deformable contour to the new 
resolution image. Another problem with the current multiresolution methods is 
that, while the image size is decreased, the fact that the new resolution image 
will be used in an exhaustive enumeration, which is a very costly process, is 
completely neglected. Neighboring image locations that will produce the same 
energies should be unified to one location. We describe the details of these prob- 
lems and a few others in Section 2.2. 

This paper addresses the above problems by employing a multiscale repre- 
sentation instead of a multiresolution representation. The method segments the 
underlying images by analyzing their structures with respect to the external 
energy in the scale-space. The segments are formed in a way that, in the fi- 
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nal segmentation the external energy related information is kept closer to the 
maximum by measuring the change of this information with respect to the scale 
change through an information theoretic approach. A special dynamic program- 
ming technique[l] is then applied to optimize the energy of Equation(l) by using 
the centroids of these segments as the degrees of freedom. This paper extends 
our previous work[l] by employing a different segmentation technique that uses 
information and scale-space theory. 

2 Snakes, DP, and Problems with Mnltiresolution 
Methods 

In this section we define the deformable contour energies, the details of DP 
methods and multiresolution methods. We will also describe the problems that 
we address in detail. 

2.1 Snakes and DP 

Equation(l) describes the general form of the snake energy. The internal en- 
ergy of the snake serve to impose smoothness and continuity of the contour. As 
mentioned earlier, the external energy, on the other hand, ties the contour to 
the underlying image by pushing the snake toward application dependent image 
features like edges. One of the biggest advantages of using snakes is that spe- 
cific applications can change the internal and external energy definitions without 
affecting the general framework. 

We define the internal energy as follows: 

Eint{vi) = I 1 I ^ ^11^^ _ ^^^^1 _ ^1 

V \Vi^iVi\\ViVi+i\ J 

where 7 is the weighting parameter and d is the distance needed between the 
contour elements. The first part of this energy formulation, which is the dot 
product of two vectors (Figure 1), is for imposing smoothness of the contour. 
Given an image I, one possible definition for the external energy is 

E,,t{v,,I) = -\VI{vi)\ (3) 

which is the negative of the image gradient V/ at Vi. Given the above formula- 
tions and an image I, we can extract and track object boundaries by defining a 
search window around each contour element and selecting the candidates from 
these search windows that minimizes the snake energy (Figure 1). The desired 
contour, V = [vi,V2, ...,Vn], can be obtained by 

n 

V = arg min aE^^t (vi ) -I- (3Eext {vi,I) (4) 

V • ^ 

2 = 1 

Assuming there are m different positions that the contour element Vi can take 
in a search window Wi, the cost of iteratively testing each possible element con- 
figuration is 0{rrd^), which grows exponentially. Fortunately, the optimization 
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Fig. 1. Each contour element Vi has a search window Wi defined around it. 

of the snake energy can be done in polynomial time using dynamic program- 
ming. Amini et. al.[S\ and Geiger et. aZ. [5] proposed DP methods for deformable 
contour optimization. In this paper, we will use Amini et. al. formulation. Our 
system can easily be ported to the system of Geiger et. al. 

The main idea under the DP method is that each contour element, Vi, can 
take only one possible position from the search window Wi. We also observe 
that the energy formula of Equation(l) can be written in terms of separate 
energy terms of Ei,E 2 , ..,A„_ 2 , such that each energy term Ei-i depends only 
on v^-l,Vi,Vi+l. 

^Snake (U1,U2, ...,Vn) = Ei{vi,V2,V3) + E2 (v 2,V3,V4) + ... + En_2{Vn-2,Vn-l,Vn) 

where 

Ei_l{v^-l,Vi,Vi+l) = Eint{Vi) + Eext{Vi). (5) 

Next we write a set of optimal value functions that hold the best energy config- 
urations up to the current contour element. 

Si{v2,V3) = mmEi{vi,V2,V3) 

Vi 

S2{V3,V4) = mmE2{v2,V3,V4) + Si{v2,V3) 

V2 

S„-2{Vn-l,V„) = minEn_2{Vn-2,Vn-l,Vn) + S„-3{Vn-2,Vn-l) 

Vn-2 

Finally, we can write 



mmEsnake= min s„_ 2 (u„_i, u„). 

Vn — 1 ,Vn 

Since each optimal value function is calculated by iterating on three contour 
elements and there are n — 2 of them, the time complexity of DP algorithm 
is polynomial and it is 0{nm^). The resulting contour produced by the DP 
algorithm is optimal since it checks every possible alternative. 

Although the time complexity of DP algorithm is polynomial, it is still too 
slow for some practical applications. Application of DP in combination with 
multiresolution methods addresses this problem, which is explained in the next 
section. 
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Fig. 2. A multiresolution representation of an echocardiographic image: The leftmost 
image is the original 240x240 image. The rightmost image is the 15x15 top level image. 
Each pixel in this 15x15 image represents a square shaped 16x16 segment marked on 
the original image. 

2.2 DP with Multiresolution Methods and Problem Details 

We will use the Gaussian pyramid as the basic multiresolution method [4] in ex- 
plaining the general structure and describing the problems. Geiger et. aZ. [5] uses 
a different multiresolution scheme [6], which preserves discontinuity between the 
image resolutions. However, most of the problems with the existing techniques 
are also present in their method. 

A Gaussian pyramid for an image I is a sequence of copies of I, where each 
successive copy has half the resolution and sample rate. The levels of a Gaussian 
pyramid for given image / is calculated as 

Gi{ij0) = I{ij) 

Gi{ijk) = w{mn)Gj{2i — m, 2j — n, k — 1) (6) 

m,7i 

where k is the pyramid level. The motivation in using a multiresolution method 
for the snake optimization is that lower the image resolution, smaller the search 
windows, which means lower number of candidate positions in each search win- 
dow. Decreasing the resolution in a multiresolution representation may be viewed 
as segmenting the original image into equal sized square segments and represent- 
ing each segment with a single pixel whose gray-level value is usually given by 
the average of the area around the segment (Figure 2). Deformable contour opti- 
mization algorithms are applied to the highest level of the pyramid. The obtained 
contour is an approximation of the final contour and it is used as the initial snake 
position for the next lower level. Using a smaller window size, the optimization 
is performed at the current level, and the process continues until the contour is 
optimized at the lowest level, which is the original image level. As expected, a 
multiresolution based DP does not necessarily produce optimal contours. 

We mentioned before that only the external energy ties the deformable con- 
tour to the underlying image. However, during resolution decreasing steps. Equa- 
tion (6) utilizes external energy minimally, which increases the loss of external 
energy related information. We argue that, unlike in Figure 2, the pixels in the 
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lowest-resolution image should represent different sized segments in the original 
image. This will give us a possibility of choosing a smaller segment size on areas 
where external energy shows greater variations, resulting in better representation 
of external energy and less loss of external energy related information. 

Another problem with the above multiresolution method is efficiency related. 
We know that the purpose of using a multiresolution method is to reduce the 
number of candidates in a search window so that the enumeration process gets 
faster. Therefore, during the construction of the newer resolutions, neighboring 
elements of search windows that will produce about the same energy should be 
unified into a single element. This modification can also be done by employing 
different sized segments - We choose a larger segment for areas where external 
energy remains relatively constant on the original image. This will increase the 
system efficiency without decreasing the performance because at the upper levels 
of the pyramid, we are not looking for the final version of the contour but only 
an approximation. 

3 A New Scale-Space Based Approach for Deformable 
Contour Optimization 

In the previous section, discussions on the problems of DP multiresolution meth- 
ods suggested that in order to utilize external energy properly, each pixel in the 
lowest resolution pyramid image should represent a variable sized segment in the 
original image. However, achieving this is very difficult with the multiresolution 
techniques because of their inherent nature - A pixel in a multiresolution pyra- 
mid level can only represent a fixed sized segment in the lower pyramid level. 
Therefore our new method does not use the multiresolution approach. 

Our solution is based on scale-space techniques, which have received a con- 
siderable amount of attention in the computer vision field[10]. The main idea 
of producing a multiscale representation is to simplify the underlying image by 
removing the fine scale details while continuously increasing the scale. This kind 
of approach gives us the possibility of analyzing the image structure with respect 
to scale. In other words, we can analyze the change of the image structure while 
the image undergoes a simplification transformation. There are major differences 
between a multiresolution representation and a multiscale representation. Lin- 
deberg has a thorough discussion about the differences in [10] and we use the 
terminology used by him. As its name implies, a multiresolution representation 
decreases the image resolution while forming the pyramid levels. On the other 
hand, a multiscale representation keeps the spatial sampling constant while the 
scale changes. 

Our new method for deformable contour optimization forms a separate scale- 
space for the search window of each contour element Vi of a snake V . Using an 
information theoretic approach, we then analyze the behavior of the external 
energy under the scale change to come up with a set of different sized square 
shaped segments of the search windows. We apply a special dynamic program- 
ming optimization[l] using the centroids of these segments as the possible po- 
sitions for optimized contour elements Vi. The resulting contour is used as the 
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initial contour position for the same kind of optimization with a smaller scale- 
space representation and smaller search window sizes. The process continues 
until the segments of the search windows correspond to an original image pixel, 
after which no segmentation is meaningful. 



3.1 Analyzing the Underlying Images and the Segmentation 

The scale-space for a search window is constructed by a repeated convolution 
of the search window with a Gaussian kernel of increasing standard deviation cr 
sampled at discrete intervals. Given a search window Wi, we construct the scale 
space Li{x,y;a) by 



Li{x, y; a) = g{x, y; cr)*W^{x, y) 




1 

- — -e 2^2 Wi{x-a,y-f3)dadf3 (7) 

ZTTCr^ 



where Li{x,y;0) = Wi{x,y) and g(x,y;a) is the Gaussian kernel with standard 
deviation a. Each sample of a is called a level of the scale-space. Levels are 
numbered starting from 0, which is the original image level. The scale of the 
level I is represented by ct/. 

We first form all scale-spaces Li, i = l..n, where n is the number of contour 
elements. Then, we segment each search window Wi by analyzing the behavior 
of the external energy with respect to change in cr. In other words, we like to 
know how the external energy changes in various areas of the search window 
if the underlying image is simplified by increasing cr in the scale-space. If the 
external energy starts to behave differently, we conclude that the corresponding 
segment of that area should be chosen smaller in order to be able to reflect the 
behavioral change better in the final segmentation. On the other hand, if the 
external energy behaves the same between the scale changes, we conclude that 
a larger segment for the corresponding area should not decrease the external 
energy related information in the final segmentation. We prefer larger segments 
in terms of efficiency because larger segments means less number of segments in 
a search window. This segmentation process addresses all the problems of the 
segmentation that we mentioned before. 

In order to measure the behavioral change of the external energy with respect 
to scale cr, we use an information theoretic approach. Let Si j be the segment 
of the search window Wi defined on image Li{x,y;ak), which is the level 
of the scale-space Li. We can measure the amount of external energy related 
information, by the Shannon entropy. 



= y))ln{p{slj{x, y))) (8) 

X y 



where 



p{stj{x,y)) 



EExt{slj{x,y)) 

Y.uJ2v^Ext{slj{u,v))' 



Similar types of information theoretic approaches were used in many scale-space 
studies by a number of researchers including Niessen et. oZ.[ll] and Jagersand[7]. 
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Fig. 3. Segmentation of the search window using different external energies. Please see 
the text for details. 



We can measure the Shannon entropy of the same segment on the immediate 
upper level fc -I- 1 by Finally, we get the normalized measure of the 

behavioral change of the external energy with respect to change in cr by 



^( 4 ) = 






(9) 



Larger the value of D{s^ j), smaller the size of the segment should be. 

The details of our segmentation is as follows. For a given m by m search 
window Wi, we form the scale-space Li up to level 1. The elements of the image 
at level k of this scale-space can be reached directly by Li{x,y;ak)- We then 
form a set of segments Si with four initial segments at the scale-space level I — 1. 






d-1 

’i,3 > 






(10) 



Each of these segments are m/2 by m/2 and they are not allowed to overlap. 
In other words, we segment the scale-space level I — 1 into four equal sized 
squares. We then choose the segment s\~^ in Si that gives the largest value 
for Equation (9). This means we are choosing the segment that has the highest 
behavioral change with respect to change in scale. is removed from the set 
Si and we add four new segments to Si that are all m/4 by m/4 and are defined 
on scale-space level / — 2 at the position of without any overlapping. The 
process continues by removing the segment producing the largest behavioral 
change value and adding four new square segments defined on the immediate 
lower scale level. This process continues until the number of segments in Si 
reaches a user determined value. 

Figure 3-a shows a midsagital ultrasound image of the tongue with the ini- 
tial contour points superimposed. Figure 3-b shows the search window of the 
marked contour element segmented using an image intensity based external en- 
ergy. Figure 3-c shows the same window segmented using the external energy 
defined by Equation (3). Figure 3-d shows the same search window segmented 
using an external energy that is sensitive to image gradient magnitude and the 
tangent angle of the contour at the marked contour element. Each segment is 
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Number of Segments 


64 


32 


16 


8 


4 


2 


Multiscale 


8.1924 


6.7940 


5.3886 


4.0486 


2.6926 


1.3669 


Mult iresolut ion 


8.1924 


6.8194 


5.4500 


4.0863 


2.7301 


1.3669 


Constant Image 


8.3178 


6.9315 


5.5452 


4.1589 


2.7726 


1.3863 



Table 1. Average Shannon entropy values for the segmentations by multiscale and 
multiresolution methods. 



shaded with a random gray-level for visualization purposes. As the figures show, 
the final segmentations are different for different types of energy, which should 
be reflected in the DP optimization process by producing better contours. 

Using the information theory, we can measure the information carried by a 
set of segments Si by 



( 11 ) 

3 



where r(s*j) = EExt(sij)/J2v EExtis^^y), Sij is the element of the segment 
set Si and Sij is the average gray-level value of the segment Sij. In order to 
demonstrate that our scheme produces sets of segments that have more exter- 
nal energy related information, we performed experiments on medical images by 
measuring the information of the produced segment sets using Equation (11). 
We then measured the information produced by the equal sized square shaped 
segmentation(Figure 2) of the usual multiresolution methods using the same for- 
mulation. We also measured the information produced by segmenting a constant 
gray level image, which has the least possible information. Notice that, the type 
of segmentation does not matter for the constant image because the resulting 
information produced by Equation (11) would be the same. Experiments were 
performed on the ultrasound image shown in Figure 3-a, by taking 64 by 64 
search windows of each contour element and by segmenting each search window 
using our multiscale method and using the multiresolution method. Finally, for 
each segment set, we measured the information produced and took the average. 
Table 1 shows these average information values for our multiscale method and 
for the multiresolution method. As the table shows our method carries more 
information than the multiresolution methods because the difference between 
multiscale values and the constant image values are greater than the difference 
between multiresolution values and the constant image values. Figure 4 shows 
this visually where we normalized the average information values by dividing it 
with the constant image information value. As expected, both methods produce 
the same information amount where the number of segments is 64^. This is be- 
cause each segment corresponds to an original image pixel and both methods 
produce the same segmentation. Similarly, both methods produce the same in- 
formation value where the number of segments is 2^. It is because our multiscale 
method initializes the segment set Si with equal sized segments as in Equation 
(10). Figure 4 also shows that our method carries much more external energy 
related information where the number of segments is around 16^, which is the 
most widely used case. 
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Fig. 4. Normalized average Shannon entropy values for Table 1. 

Although there are major differences, our segmentation method shows resem- 
blance to quad-tree type segmentation methods [9]. Our method uses scale-space 
techniques to analyze the behavior of the external energy with respect to change 
in the scale to decide which segment to divide. Quad-trees on the other hand do 
not pay attention to scale changes. They simply use the variance in the image 
to decide which segment to divide. 

4 Experiments 

We tested our system by performing experiments on medical images, which are 
known to be very problematic for contour analysis. In order to show the perfor- 
mance of our system we compared the contours of the multiresolution methods 
with our multiscale method at the highest level of the pyramid. This is because 
we know that final results are corrected by the contour optimizations at the 
lowest levels, which are the same for our method and for the multiresolution 
methods. This paper presents two of the test sets that we used. 

The first test set is a sequence of midsagital ultrasound images of the tongue 
during speech. In addition to the usual ultrasound imaging problems, open con- 
tours and application specific problems makes contour analysis of these sequences 
very difficult[2]. Figure 5-(a) shows the tracked contours for four frames by our 
system. Figure 5-(b) shows the tracked contours produced by the multiresolution 
method. Our method spends about 28 seconds of CPU time for each contour. The 
multiresolution method spends about 43 seconds. We compared these contours 
against the ground truth obtained by a non-multiresolution dynamic program- 
ming system, which guarantees to give optimal results. The comparison is done 
by measuring the distances between the corresponding contour element positions 
of the two contours. Our system produced an average of 6.12 pixel difference. 
The other method produced an average of 12.91 pixel difference. 

The experiments on ultrasound images confirmed the accuracy of our sys- 
tem. Next, we like to see if we can achieve the same performance using a smaller 
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(a) (b) (c) 



Fig. 5. Tracking results from (a) Multiscale method (b) Multiresolution method, (c) 
Results of our system applied to an MRI heart image sequence. 



number of segments, which would result in faster execution times. For this ex- 
periment we used a sequence of four frames of a right anterior oblique (RAO) 
view contrast ventriculogram (CV) images from a normal human subject. Since 
they are less noisy and the contours are closed, MRI images are easier to analyze. 
Manually detected contours were used for verification of our results. First we run 
the multiresolution method on the sequence using the first frame’s manually de- 
tected contour as the initial contour positions for all the frames in the sequence. 
We used 64 points for each 32 by 32 search window. Each contour extraction 
took an average of 26.78 CPU seconds. The average contour element difference 
with the manually detected contours was 2.88 pixels. We then did the same ex- 
periment using our multiscale method. We used only 25 points(segments) for 
each search window to speedup the optimization process. The average contour 
element difference with the manually detected contours was 2.87, which is almost 
the same with the multiresolution method. However, we saw a big difference in 
the time taken for each contour optimization: it took only an average of 7.42 
CPU seconds for each contour extraction process with our method. Figure 5-(c) 
shows the tracking results from our system. 
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5 Conclusions 

We presented a new multiscale approach for dynamic programming based de- 
formable contour minimization. The system introduces a number of novel ideas 
that would be valuable for discovering new uses of scale-spaces in model based 
analysis of 2D and 3D images and image sequences. We confirmed through the 
experiments that the new method can achieve faster optimization times and per- 
forms better than the current dynamic programming optimization methods that 
are based on multiresolution techniques. 

Our method reduces the number of possible locations that a contour ele- 
ment can take, dramatically shortening the execution time of the optimization. 
Although multiresolution methods use the same idea, our multiscale approach 
uses a scale-space approach to come up with a better set of candidate positions 
that makes the optimization process faster and increases the performance. Us- 
ing information theory, the system analyzes the behavior of the external energy 
with respect to the scale change. This analysis gives us information on how to 
segment the underlying images so that reduced number of candidate positions 
carries more external energy related information. A previously developed dy- 
namic programming method [1] is used to optimize the contour energy on these 
points to produce the final contours. The system can be generalized to different 
deformable contour and deformable model applications by changing the internal 
and external energies and the segmentation algorithm to fit the specific needs of 
the application. 
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Abstract. A simple derivation of properties of a normal white noise 
random field in linear scale-space is presented. The central observation 
is that the random field has a scaling invariance property. From this 
invariance it is easy to derive the scaling behaviour of measurements 
made on normal white noise random fields. 



1 Introduction 

Properties of normal white noise in scale-space have been studied previously for a 
number of reasons. Images may be corrupted by noise and scale-space smoothing 
may improve the signal to noise ratio. Noise has served as a model to study the 
behaviour across scales of properties such as the number of local extrema or the 
volume of grey-level blobs [4] . Deviations from the scaling behaviour of properties 
of white noise or ensembles of natural images [5] may provide useful information 
to a visual system. Apart from the covariance of normal white noise in scale- 
space [1] results have been achieved mostly by simulation. The purpose of this 
paper is to illustrate that some useful results are available analytically. 



2 An invariance of noise in scale-space 



It is well known [3] that the only functions that are form invariant under linear 
scale-space filtering are the derivative of Gaussian functions 



G"(x;t) 



ani an TV _Z 

(27Tt)^/2 



Filtering these functions with a Gaussian kernel G° is equivalent to a rescaling 
as expressed by the invariance x — >■ sx, t — >■ G" — >■ or G"(x; f) = 

s“”“'^Gn(sx; s^t). N denotes the dimension of space, x £ , n = (m, ...jUn) 

specifies the derivative operator , and n = Ui its order. The squareroot of 
the second argument 0 < t £ R is the “scale” of G". 

There is also a family of random fileds that is invariant under scale-space 
filtering with a kernel G° in the sense that a filtering of the random field is 
equivalent to a rescaling of the joint distribution function. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 423—428, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




424 P. Majer 



Members ^"(x; t) of this family are generated (and defined) by filtering a nor- 
mal white noise (x; 0) of zero mean and standard deviation a with a derivative 
of Gaussian filter kernel G": 

r(x;t) = (G“(-;t)*^°(.;0))(x) 

These normal random fields are completely determined by their autocovariance 
function 



7 ”(x — x', t -I- t') = CT^(— 1)” G^”(x — x'; t -I- t') 

that describes the covariance of ^"(x;t) and ^"(x';t'). It follows immediately 
that the form invariance of G” is inherited by the random fields: 

7 "(x - x', t + t') = s- 2 "s-^ 7 "(s(x - x'), + t')) (!’) 

The (joint distribution function of the) random field is invariant under the 
rescaling 



X — sx 

t — >• s^t (1) 

a ^ s-^s-^/^a 

Figure (1) displays a one-dimensional realization of the same real- 

ization filtered to 32^), and lastly a rescaled display of the first graph. 



noise al scale 8 




0 250 500 750 1000 



noise at scale 32 




window of noise al scale 8 

1 

0.5 
0 
0.5 
-1 

0 125 250 






Fig. 1. Normal noise at scale %/t = 8, filtered to scale 32, and a rescaled display of the 
noise at scale 8 showing only 0 < a: < 256. 



Obviously the particular function that we have realized is not scaling in- 
variant. Filtering the function in the first graph results in the second which is 
apparently different from the third graph that shows the appropriately rescaled 
version of the first. However, the similarity of these graphs serves to illustrate 
the fact that they are generated by identical random mechanisms, i.e. that the 
random field is scaling invariant. 

The invariance of normal noise under the scaling transformation (1) allows to 
derive the scaling behaviour of any observations made on a random field ^"(x; t). 

Some examples follow. 
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3 Density of local extrema 

The number of local extrema of normal white noise in scale-space has been 
studied as a model for the scale dependence of the number of features in a signal 
[4]. Computation of the expected value of the number of local extrema at a 
fixed scale is extremely difficult [4]. However, the scale invariance property (1) 
of normal white noise directly gives a relationship between the distributions of 
the numbers of local extrema at different scales. 

From (1) we find that the distribution of the number of local extrema 
in a volume V of space at t is identical to the distribution of the number of local 
extrema in a volume s^V at s^t. More specifically: 

— the probability of observing less than local extrema in a unit 

volume (f^dx = 1) of filtered white noise ^"(x;t) at scale \/i is related to 
P 52 ((N®’^*) at scale s\/t by 

Pi(N®’'*) = s^P,2t(N®’'*) (2) 

— the expected number P(N®’'*) of local extrema over space per unit volume 
of space behaves as 

P(N®"*) oc (3) 

Note that (2) and (3) hold for any derivative n in ^"(x; t). 

Similar relations hold for the distribution and expectation of the number 
NScSp of local extrema over scale and space per unit volume of scale and space: 

P(N®®®P) cx t-^/2-i (4) 

The scale-dependence (4) of the number of local extrema over scale and space 
is validated by simulation experiments. Figure (2) shows a plot of logN®®®P 
against log t for one-dimensional and two-dimensional white noise. 




Fig. 2. Log-Log plot of the number of local extrema against scale for a one-dimensional 
(top curve) and a two-dimensional normal white noise. The theoretical curves are 
depicted as lines. 
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4 Edge lengths 

The distribution of edge lengths I in normal noise ^"(x; t) at scale -\/t is identical 
to the distribution of scaled edge lengths si at scale s\/i. Again this scaling 
invariance results directly from the invariance of the distribution of normal white 
noise under the scaling transformation (1) without the need to actually compute 
the distribution of edge lengths. It should be noted that for this scaling behaviour 
to hold it is essential that edges are computed by an algorithm that commutes 
with the scaling transformation, e.g. zero crossings of differential invariants. 

Let us denote by Pt{l) the relative frequency of edges of lengths less than I 
occurring in the set of all edges at scale -\/t in a normal noise image ^"(x; t). Pt{l) 
is identical to the probability Ps 2 t{sl) of edges of lengths less than si occuring 
in a filtered image s'^t) 



Pt{l) = PMsl) 

so that the expected edge lengths grow linearly in scale '/i 

E{1) oc y/t (5) 

as shown on the left of figure (3). 





sqrt(t) 

Fig. 3. Mean length of edges E{1) as a function of scale y/i. left: without boarder 
effects, right: with boarder effects. Theoretical relations are shown as lines. 



4.1 Edge lengths with boarder effects 

In contrast to dimensionless features the distribution of edge lengths is certainly 
affected by the image boarder cutting some edges short. We therefore attempt 
to describe the effect of this on the distribution of edge lengths. 

Consider a two-step procedure to arrive at the measured edge lengths. First 
edges are computed from a hypothetical boarderless image. Then this is cropped 
to the observed image size. Thereby some edges are cut into two. One piece of 
each of these cut edges is kept. With probability one half it will be the long and 
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with equal probability the short piece, so that the expected length after cutting 
is one half that before. If we denote by p^{l) the probability density of lengths 
of edges cut by the image boarder, we have 

pm = 2ptm 

where pt{l) is the density of lengths /, i.e. Pt{l) = /g du pt{u). Each edge has 
a certain probability p^ to be on the boarder of the image. This probability p^ 
depends on the length I of the edge. If we assume that p^ is linear in I — which 
should be a good assumption as long as the edge length is smaller than the 
length of the image — it will scale like 

pHI) = s~V{sl) 

The density of observed lengths at scale y/i 

{i-pHi))pt{i)+pHi)pm 



then scales to 



(1 - s ^p^{sl))sps 2 t{sl) + 2 p^{sl)ps 2 t{ 2 sl) 

Thus the mean length depends on t as 

E{1) (X Vi — at 

with a constant a that depends inversely on the length of the image boarder and 
on the edge detection and linking algorithm used. Figure (3) shows a fit of the 
scale dependence of edge lengths in 512 by 512 pixel white noise images in scale- 
space. As edge-detection and linking algorithm we used Canny’s non-maximum 
suppression and hysteresis thresholding [2] (for thresholding see below). 

5 Blob volumes 

Volumes of so called grey-level blobs have been used to construct a systematic 
approach for the extraction of important structures in images [4]. Their signif- 
icance was assessed from a comparison to the expected blob volume in normal 
noise. 

For the analysis of their scale dependence in normal noise it suffices to know 
that grey-level blob volumes are integrals of the (smoothed) intensity function 
over regions of the image domain, and that the regions are defined by geomet- 
ric properties of the intensity [4]. Irrespective of whether each region grows or 
shrinks with increasing scale, the number of regions decreases like and 

thus there average area A increases like 

E(A) = ^ 
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More generally, the distribution of areas of the regions of integration shows the 
invariance Pt{A) = A). 

The values of the intensity function depend on scale as so that the 

integrals over the above areas depend on scale like i^/2-^/4 

S(blob volume) ex (6) 

as reported in simulation studies by Lindeberg [4]. 



6 Scale dependent thresholds 



The described scale dependencies hold only when the measurements commute 
with the scaling transformation. The introduction, for example, of a threshold 
in Canny’s edge detection and hysteresis algorithm would destroy the scale de- 
pendence shown in figure (3). 

Thresholds may however be modified to depend on scale such that their rela- 
tive position within the distribution of values they are applied to is independent 
of scale. Or, conversely, the distribution of values to be thresholded may be 
rescaled. In the edge detection a threshold on the absolute value of the gradient 
should be be proportional to (for a two-dimensional image). Alternatively, 
as in figure (3) a fixed threshold was used and ’standardized’ gradients 






e 



2t 



(27rt)^/2 



were computed. The use of standardized derivatives is superior to a scale-depen- 
dent threshold in that it may be numerically checked by setting the power of the 
filter kernel equal to 1. 



7 Summary 

Scale dependencies of distributions of properties of white noise in scale-space 
were derived from a scaling invariance of normal random fields. The method 
is usually much simpler than a direct computation of the distribution at fixed 
scales and subsequent derivation of the scale dependence. 
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Abstract. In this paper we summarize the main features of a new time 
dependent model to approximate the solution to the nonlinear total vari- 
ation optimization problem for deblurring and noise removal introduced 
by Rudin, Osher and Fatemi. Our model is based on level set motion 
whose steady state is quickly reached by means of an explicit procedure 
based on an ENO Hamilton- Jacobi version of Roe’s scheme. We show 
numerical evidence of the speed, resolution and stability of this simple 
explicit procedure in two representative ID and 2D numerical examples. 



1 Introduction 

The Total Variation (TV) deblurring and denoising models are based on a vari- 
ational problem with constraints using the total variation norm as a nonlinear 
nondifferentiable functional. The formulation of these models was first given by 
Rudin, Osher and Fatemi in ([10]) for the denoising model and Rudin and Os- 
her in ([9]) for the denoising and deblurring case. The main advantage is that 
their solutions preserve edges very well, avoiding ringing, but there are com- 
putational difficulties. Indeed, in spite of the fact that the variational problem 
is convex, the Euler-Lagrange equations are nonlinear and ill-conditioned. Lin- 
ear semi-implicit fixed-point procedures devised by Vogel and Oman, (see [11]), 
and interior-point primal-dual implicit quadratic methods by Chan, Golub and 
Mulet, (see [3]), were introduced to solve the models. Those methods give good 
results when treating pure denoising problems, but the methods become highly 
ill-conditioned for the deblurring and denoising case where the computational 
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cost is very high and parameter dependent. Furthermore, those methods also suf- 
fer from the undesirable staircase effect, namely the transformation of smooth 
regions (ramps) into piecewise constant regions (stairs). 

In ([5]), a very simple time dependent model was constructed by evolving the 
Euler-Lagrange equation of the Rudin-Osher optimization problem, multiplied 
by the magnitude of the gradient of the solution. The two main analytic features 
of this formulation were the following: 1) the level contours of the image move 
quickly to the steady solution and 2) the presence of the gradient numerically 
regularizes the mean curvature term in a way that preserves and enhances edges 
and kills noise through the nonlinear diffusion acting on small scales. To ap- 
proximate the solution we used a higher order accurate ENO version of Roe’s 
scheme, for the convective term, and central differencing for the regularized mean 
curvature diffusion term. This explicit procedure is very simple, stable and com- 
putationally fast compared with other semi-implicit or implicit procedures. We 
show numerical evidence of the power of resolution and stability of this explicit 
procedure in two representative ID and 2D numerical examples, consisting of a 
noisy and blurred signal and a noisy image, (we have used Gaussian white noise 
and Gausssian blur). We have observed in our experiments that our algorithm 
shows a substantially reduced staircase effect; we give an explanation for this in 
next section. 

2 Deblurring and Denoising 

Let us denote by uo the observed image and u the real image. A model of blurring 
comes from the degradation of u through some kind of averaging. The model of 
degradation we assume is 

j *u + n = uo, (1) 

where n is Gaussian white noise, i.e., the values n* of n at the pixels i are 
independent random variables, each with a Gaussian distribution of zero mean 
and variance cr^ and j(x,y), is a kernel, where the blurring is defined through 
the convolution: 



(j*u)(x,y)= u(s,r) j(x - s,y - r)dsdr (2) 

Jn 

For the sake of simplicity, we suppose that the blurring is coming from a con- 
volution, through a kernel function j such that j * u is a selfadjoint compact 
integral operator. For any a > 0 the so-called heat kernel, defined as 

= (3) 

47ra 

is an important example that we will use in our numerical experiments. 

Our objective is to estimate u from statistics of the noise, blur and some a 
priori knowledge of the image (smoothness, existence of edges). This knowledge 
is incorporated into the formulation by using a regularization functional R that 
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measures the quality of the image u, in the sense that smaller values of R{u) 
correspond to better images. The process, in other words, consists in the choice 
of the best quality image among those matching the constraints imposed by the 
statistics of the noise together with the blur induced by j. 

In [10], the Total Variation norm or TV-norm is proposed as a regularization 
functional for the image restoration problem: 

TV{u) = J \Vu\dx = J \Jul + u'^ dx. (4) 

The TV norm does not penalize discontinuities in m, and thus allows us to recover 
the edges of the original image. There are other functionals with similar proper- 
ties introduced in the literature for different purposes, (see for instance, [4,2]). 
The restoration problem can be thus written as the following constrained opti- 
mization problem: 



subject to 



min 

U 



|Vu| dx 



1 

2 




Uq)'^ dx 




(5) 

( 6 ) 



and its Euler-Lagrange equation, with homogeneous Neumann boundary condi- 
tions for u is: 




+ Mo)) 



(7) 



There are known techniques for solving the constrained optimization problem 
(5) by exploiting solvers for the corresponding unconstrained problem, whose 
Euler-Lagrange equation is (7) for A fixed. 



3 The time dependent model 

Vogel and Oman and Chan, Golub and Mulet devised direct methods to approx- 
imate the solution to the Euler-Lagrange equation (7) with an a priori estimate 
of the Lagrange multiplier and homogeneous Neumann boundary conditions. 
Those methods work well for denoising problems but the removal of blur be- 
comes very ill-conditioned with user-dependent choice of parameters. However, 
stable explicit schemes are preferable when the steady state is quickly reached 
because the choice of parameters is almost user-independent. Moreover, the pro- 
gramming for our algorithm is quite simple compared to the implicit inversions 
needed in the above mentioned methods. 

Usually, time dependent approximations to the ill-conditioned Euler-Lagrange 
equation (7) are inefficient because the steady state is reached with a very small 
time step, when an explicit scheme is used. This is the case with the following 
formulation due to Rudin, Osher and Fatemi (see [10]) and Rudin and Osher 
(see [9]): 



Ut = — A j * {j * U — Mo) -I- V • 



Vm 



(8) 
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with u{x, y, 0) given as initial data, (we have used as initial guess the original 
blurry and noisy image Uq) and homogeneous Neumann boundary conditions, 
i.e., = 0 on the boundary of the domain. 

This solution procedure is a parabolic equation with time as an evolution 
parameter and resembles the gradient-projection method as used in [9] and [10]. 
In this formulation we assume an a priori estimate of the Lagrange multiplier, 
in contrast with the dynamically changing A used in [9] and [10]. 

However, this evolution procedure is slow to reach steady state and is also 
stiff since the parabolic term is quite singular for small gradients. In fact, an ad 
hoc rule of thumb would indicate that the timestep At and the space stepsize 
Ax need to be related by 

-^<c\Aul (9) 

for fixed c > 0, for stability. This CFL restriction is what we shall relax to 



At 

Ax^ 



< c, 



(10) 



for c around 0.5. In order to avoid these difficulties, we propose a new time 
dependent model that accelerates the movement of level curves of u and regular- 
izes the parabolic term in a nonlinear way. In order to regularize the parabolic 
term we multiply the whole Euler-Lagrange equation (7) by the magnitude of 
the gradient and our time evolution model reads as follows: 



ut = -jVu] A j *{j*u- uo) + [Vu] V • j ■ (11) 

We use as initial guess the original blurry and noisy image uq and homoge- 
neous Neumann boundary conditions as above, with an a priori estimate of the 
Lagrange multiplier. 

From the analytical point of view this solution procedure approaches the 
same steady state as the solution of (7) whenever u has nonzero gradient. The 
effect of this reformulation, (i.e. preconditioning) is positive in various aspects. 
The numerical scheme is simple to program, satisfies a maximum principle, it 
is at least an order of magnitude faster than standard TV implicit procedures. 
The resulting time evolution problem involves the motion of level sets and has 
a morphological flavor. 

A very simple way to extend the Roe scheme to get high order accuracy is 
described in [8]. For more detail involving the numerical method see [5]. We note 
that the staircasing is minimized because our unconventional numerical method 
gives numerical steady states, based on nonoscillatory ideas. These numerical 
steady states will generally be different from those obtained by from those ob- 
tained by [10], [9], [11], [2] and [3] which used standard central differencing. 



4 Numerical Experiments 

We have used ID signals with values in the range [0,255]. The signal of (1, left) 
represents the original signal versus the blurred and noisy signal with ct = 5, 
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Fig. 1. Left, original vs. noisy and blurred ID signal ; right, original vs. recovered ID 
signal 



and SNR « 5. The signal of (1, right) represents the original signal versus the 
recovered signal after 80 iterations with first order scheme with CFL 0.25. The 
estimated A = 0.25 was computed as the maximum value allowed for stability, 
using explicit Euler time stepping. 



SNR=3 




50 100 150 200 250 50 100 150 200 250 



Fig. 2. Left: original image, right: noisy image, SNR« 3. 



Our 2D numerical experiments were performed on the original image (Fig 
2, left) with 256 x 256 pixels and dynamic range in [0,255]. The third order 
scheme we used in our 2D experiments was based on a third order accurate 
ENO Hamilton- Jacobi version of Roe’s scheme described in [8], (see details in 
[5]). Our 2D experiment was made on the noisy image, (2, right), with a SNR 
which is approximately 3. Details of the approximate solutions using the Chan- 
Golub-Mulet primal-dual method and our time dependent model using the third 
order Roe’s scheme, (described above), are shown in Fig. 3. We used A « 0.0713 
and we perform 50 iterations with CFL number 0.1. 
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Resolution 256x256. SNR \approx 3, Estimated \lambda=0.0713 





Fig. 3. Left: image obtained by the Chan-Golub-Mulet primal-dual method, right: 
image obtained by our time evolution model, with 50 timesteps and CFL-0.1 
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Abstract. The maxima of Curvature Scale Space (CSS) image have 
already been nsed to represent 2-D shapes under affine transforms. Since 
the CSS image employs the arc length parametrisation which is not affine 
invariant, we expect some deviation in the maxima of the CSS image 
under general affine transforms. 

In this paper we examine the advantage of nsing affine length rather than 
arc length to parametrise the curve prior to computing its CSS image. 
The parametrisation has been proven to be invariant under affine trans- 
formation and has been used in many affine invariant shape recognition 
methods. 

The CSS representation with affine length parametrisation has been used 
to hnd similar shapes from a large prototype database. 



Keywords: Curvature scale space. Affine transformation, Image databases, 
Shape similarity retrieval. Affine length 

1 Introduction 

The CSS representation finds its roots in curvature deformation and heat equa- 
tion. In fact, the resampled curvature scale space [6] implements curvature defor- 
mation [4] . This is carried out by convolving each coordinate of a closed planar 
curve, with a Gaussian function at different levels of scale. At each stage and 
before being convolved by a larger width Gaussian, the curve is represented in 
terms of arc length parameter. In regrtZor curvature scale space [6] the resampling 
is not applied. As a result, the evolution is not a curvature deformation anymore. 
However, the implementation is carried out much faster and the representation 
has shown a good performance in shape similarity retrieval [1][5] under similarity 
transforms. 

Both regular and resampled CSS image employ the arc length parametrisa- 
tion which is not affine invariant. As a result, we expect some deviation in the 
maxima of the CSS image under general affine transformation. It has been shown 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 435—440, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




436 



S. Abbasi and F. Mokhtarian 




(a) 




(b) (c) 



Fig. 1. a)Curve evolution, from left: a — 1,4, 7, 10, 12, 14. (b)The regular CSS image 
of the shape, (c) The resampled CSS image. 



that affine invariance can only be achieved by an affine invariant parametrisation 
and affine length has been used by a number of authors [2] [3] . In this paper we 
examine the utility of using affine length rather than arc length to parametrise 
the curve prior to computing its CSS image. 

We have a database of 1100 images of marine creatures. The contours in this 
database demonstrate a great range of shape variation. A database of 5000 con- 
tours has been constructed using 500 real object boundaries and 4500 contours 
which are the affine transformed versions of real objects. Both regular and re- 
sampled CSS representations are constructed with affine length parametrisation 
and then used to find similar shapes from this prototype database. 

2 The CSS image 

This section describes the process of CSS construction for both regular and 
resampled CSS image. The use of affine length instead of arc length and finally 
the CSS matching are also briefly explained. 

Construction of the regular CSS image; In order to use arc length, the 
curve is resampled and represented by 200 equally distant points. Considering 
the resampled curve as T'o(s) = (xo(s), yo(s)), we smooth the curve by Gaussian 
function: 



X(s,t) = Xo(s)*g(s,t) Y{s,t) = yo{s) -k g{s,t). 

The smoothed curve is called where cr denotes the width of the Gaussian 
kernel. It is then possible to find the locations of curvature zero crossings on F^y 
[5]. The process starts with cr = 1, and at each level, a is increased by Aa, chosen 
as 0.1 in our experiments. As a increases, F„ shrinks and becomes smoother, and 
the number of curvature zero crossing points on it decreases. Finally, when a is 
sufficiently high, F„ will be a convex curve with no curvature zero crossings (see 
Figure 1(a)). The process of creating ordered sequences of curves is referred to 
as the evolution of F. 

If we determine the locations of curvature zero crossings of every T), during 
evolution, we can display the resulting points in (u, cr) plane, where u is an 
approximation of the normalised arc length and cr is the width of the Gaussian 
kernel. The result of this process can be represented as a binary image called the 
regular CSS image of the curve (see Figure 1(b)). 
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Construction of the resampled CSS image; The process of constructing 
the resampled CSS image is slightly different. It starts with convolving each 
coordinate of the initial curve T’g(s) with a small width Gaussian filter. The 
resulting curve is re-parametrised by the normalised arc length and convolved 
again with the same filter. This process is repeated until the curve becomes 
convex and no longer has a curvature zero crossing. The curvature zero crossings 
of each curve are marked in the resampled CSS image. 

AfRne length; In order to achieve an affine invariant parametrisation, arc 
length, s, is usually replaced by affine length, r, with the following definitions. 

„ + _ Io{iy-xy)^ 

The main disadvantage of the affine length is that its computation requires 
higher order derivatives. However, by using the method described in [5] , we can 
parametrise the curve using this formula. 

Both regular and resampled CSS images can be reconstructed using affine 
length instead of arc length. In regular CSS image, only the initial representation 
is affine length and re-parametrisation is not applied. In resampled CSS image, 
however, after each iteration the resulting curve is re-parametrised using affine 
length parametrisation. 

Curvature Scale Space Matching; We assume that the user enters his query 
by pointing to an image. The same preprocessing is done to find the maxima of 
the CSS contours of the input shape and compare them with the same descriptors 
of the database objects. The algorithm used for comparing two sets of maxima, 
one from the input and the other from one of the models, has been described in 
[5] . The algorithm first finds any possible changes in orientation which may have 
been occurred in one of the two shapes. A circular shift then is applied to one of 
the two sets to compensate the effects of change in orientation. The summation 
of the Euclidean distances between the relevant pairs of maxima is then defined 
to be the matching value between the two CSS images. 



3 Experiments and results 



In this paper, we examine the performance of the CSS representation under a 
combination of rotation and shear transform represented by the following ma- 
trices. 



A 



rotation — 



/ COS0 

sinO 



—sinO 

COS0 



A 



shear — 




The measure of shape deformation depends on the parameter k, shear ratio, in 
the matrix Aghear- In the present form of the matrix Aghear, x axis is called 
shear axis, as the shape is pulled toward this direction. 
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Fig. 2. The deformation of shapes is considerable even with fc = 1 in shear transform. 
The original shape is presented in top left. Others represent transformation with k = 1 
and e = 20°, 40°, ..., 160°, 180°. 



Figure 2 shows the effects of affine transformation on shape deformation. In 
this Figure, shear ratio is selected as fc = 1. In order to achieve different shear 
axes, we have changed the orientation of the original shape prior to applying 
the pure shear transformation. The values of 9 range from 20° to 180°, with 20° 
intervals. As this Figure shows, the deformation is severe for fc = 1. For larger 
values of fc, e.g. 1.5 and 2, the deformation is much more severe. 

In order to create three different databases, we chose three different values for 
shear ratio, 1.0, 2.0 and 3.0. We then applied the transformation on a database 
of 500 original object contours. From every original objects, we obtained 9 trans- 
formed shapes with different values of 9. Therefore, each database consisted of 
500 original and 4500 transformed shapes. 

In order to evaluate the performance of the method, every original shape was 
selected as the input query and the first n outputs of the system were observed 
to see if the transformed versions of the query are retrieved by the system. The 
results indicated that the performance of regular CSS is much better than the 
resampled CSS and using affine length parametrisation instead of arc length 
improves the performance of both representations. 

Considering each original, ie not affine transformed, shape as an input query, 
we observed the first n outputs of the system and determined m, the number 
of outputs which are the affine transformed versions of the input. The success 
rate for a particular input is calculated as — ; where rrimax is the maximum 

possible value of m. Note that nimax is equal to n if n < 10; if not, nimax is 
equal to 10. The success rate of the system for the whole database will be the 
average of the success rates for each input query. 

We chose different values for n, ranging from 2 to 40, and in each case found 
the average success rate of the system for all 500 original shapes. The same 
experiment was carried out on four different CSS representations, including reg- 
ular and resampled CSS image with arc or affine length parametrisation. The 
results are presented in Figure 3(a) to 3(d). Each Figure includes three curves 
associated with three values of fc, the shear ratio. Each curve shows the average 
success rate for the particular type of the CSS representation and for different 
values of n, the number of observed outputs. 
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regular CSS resampled CSS 





(a) Regular CSS with arc length 




(b) Resampled CSS with arc length 




(c) Regular CSS, affine length (d) Resampled CSS, affine length 

Fig. 3. Identifying transformed versions of the input query, k is the shear ratio and 
represents the measure of deformation, (see section 3) 



Starting from 3(a), we observe that the conventional regular CSS image shows 
good results. For example, with k = 1 and in spite of severe deformation, more 
than 93% of outputs are always the affine transformed versions of the input 
query. This figure drops to 80% as k increases but it is still reasonably large. 

The resampled CSS image with arc length parametrisation is quite vulnerable 
against affine transforms. In most cases, none of the transformed versions of the 
input query appear as the first few outputs of the system. 

With affine length parameterisation, both regular and resampled CSS image 
show much better results. Almost all affine transformed versions of an input 
query appear among the first outputs of the system (see Figure 3(c) and (d)). 
The results are also robust with respect to k, the shear ratio. 

In conclusion we observe the followings. 

- Regular CSS image is almost robust with respect to affine transforms. 

- Resampled CSS image with arc length parameterisation is not and with affine 
length parameterisation is robust with respect to affine transforms. 

- Since the transformation is applied mathematically, the effects of pre-process- 
ing noise has not been considered. In real world applications, when the object 
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boundaries must be extracted from images taken from different camera view- 
points, noise changes the object boundaries dramatically. However, we expect 
that using affine length instead of arc length improves the performance of both 
regular and resampled CSS image even in presence of such noise. 

4 Conclusion 

The maxima of Curvature Scale Space (CSS) image have been used to represent 
closed planar curve in shape similarity retrieval under affine transforms. Two 
types of representations, namely regular and resampled were examined. The 
curve evolution in resampled CSS image is an implementation of curvature de- 
formation. In regular CSS, however, it is only an approximation of the curvature 
deformation. 

In conventional forms, arc length parametrisation is used in both types. In 
this paper we examined the utility of using affine length instead of arc length 
to parametrise the curve prior to computing its CSS image. In different sections 
of this paper, we reviewed the background of the representations as well as 
parametrisations. We then carried out a number of experiments to compare the 
performance of our shape similarity system using different approaches. 

We observed that the performance of regular CSS representation in shape 
similarity retrieval under affine transforms is much better than the performance 
of resampled CSS representation. We also observed that both representations 
improved by using affine length parametrisation. 
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Abstract. The notion of a stochastic scale space has been introduced 
through a stochastic approximation to the Perona-Malik equation. The 
approximate solution has been shown to preserve scale-space causality 
and is well-posed in an expected sense. The algorithm also converges to 
a (unique) constant image. 

Keywords : stochastic scale space, discrete scale space, stochastic ap- 
proximation 



1 Introduction 

Images can be represented at a variety of scales through a multiscale charac- 
terization [4]. Among the several methods used to obtain a multiscale charac- 
terization, it has been shown [1,7] that the PDE approach is the most generic 
and most other approaches can be re-cast in the framework of PDEs. Interest 
in scale-space theories has increased after Perona and Malik [5] proposed a non- 
linear scale-space based on a nonlinear diffusion PDE which smooths different 
regions of the image at different rates, thereby accentuating edges. 

Although the Perona-Malik equation had impressive results, several modifi- 
cations have been suggested to avoid its theoretical and numerical difficulties. 
This paper proposes a stochastic approximation to the discretized Perona-Malik 
equation using a system of particles distributed on the pixel array and evolving 
according to probabilistic rules. For a fixed pixel array, the stochastic algorithm 
is theoretically well-posed (has a unique stationary distribution and the expec- 
tation of the one-step evolution matrix is Lipschitz continuous). As the pixel 
distance goes to zero, the solution of the stochastic algorithm converges weakly 
[2] to a unique solution. If the Perona-Malik equation has a solution, at least in 
a weak sense, the stochastic algorithm converges to this solution. If however, the 
Perona-Malik equation has no solution [3], the algorithm is merely a stochastic 
aproximation which converges to a unique solution. 

The motivation in using a particle system is that the state of the system 
(characterized by the number of particles at each site) directly corresponds to 
a set of gray-level values. Since digital images are invariably quantized, any 
non-integral solution, as obtained by classical numerical methods for example. 
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has to be scaled and quantized, resulting in some loss of information and small 
inaccuracies. Further, the image at any particular scale is a quantization of 
the solution at a particular instant of time rather than the solution itself. The 
semigroup property (that an image at scale ti + 12 may be obtained by observing 
at scale t 2 an image already approximated at scale ti) then does not strictly 
hold for the quantized image since the intermediate image is different from the 
intermediate (real-valued) solution and hence, corresponds to a different initial 
value for further approximation. The particle system described in this paper has 
the advantage of maintaining a valid image as the solution at every instant. 



2 A stochastic scale space 



In this section, we first consider a discretization of the Perona-Malik equation. 
A stochastic algorithm is then formulated in such a manner that it is “locally 
consistent” with the discretized equation. By this we mean that the stochastic 
algorithm evolves for each iteration, in an expected sense, in the same way as 
the deterministic equation. 

The Perona-Malik equation is given by 

^=dW{D{-)Vf) (1) 

where f : Sx — >■ R represents the evolution with time of the image defined 

on a compact set S. D{-) is a decreasing function of the gradient computed from 
the solution at every instant of time. 



D{x) = exp 



(EJMl 

\ 2K 



( 2 ) 



Using a standard second-order discretization, a discretized version of (1) can 
be obtained as 









£)(") 










(3) 



where n denotes the discrete time instants and i the pixel coordinate. Af{i) is 
the 4-neighbourhood of i. 

We now formulate a stochastic algorithm with the same kind of behaviour 
as (3) in an expected sense. 



2.1 Stochastic algorithm 

To formulate a stochastic algorithm which represents (3) in an expected sense, 
we treat the image as a system of particles with the gray level at any pixel 
corresponding to the number of particles. 
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Let S denote the pixel array on which the image is defined and Af{-) the 
symmetric 4-neighbourhood. Xj^(n) denotes the number of particles at pixel i at 
time instant n. 

At each instant of time ‘n’, each ATj(-) evolves according to the following 
probabilistic rule: 

1. Each pixel i chooses one of its neighbours j € Af(i) according to a uniform 
selection probability. 

2. The neighbour j is “accepted” with probability ^ ^ J 

where i+j denotes the coordinate- wise addition of i and j. The total prob- 
ability of a transition from i to j is then given by 

Pr(i, *) = 1 - Pr(*, J) (5) 

3. If the neighbour is “accepted” , (n) and Xj (n) are both updated according 

to the rule 

X^(n + 1) = Xj^(n) + int A (^Xj{n) — X^{n)^ 

Xj{n + 1) = Xj{n) — int A (^Xj{n) — X^{n'^ (6) 

An additional particle is transferred (to the neighbour with lesser number of 
particles) with probability A Xj (n) — Xj^ (n) — int A Xj (n) — Xj^ (n) 

Each Xj^(n+1) is thus a convex combination of its neighboring values and this 
effects a smoothing at i. Of course, this smoothing takes place with a probability 
inversely proportional to the strength of the edge, so that stronger edges are less 
likely to be smoothed while weaker edges are more likely to be smoothed. 

The stochastic system evolves, in the expected sense, for one time step in the 
same way as (3). 

Since the evolution algorithm is stochastic, the variance of the algorithm 
plays an important part. The variance should be small, if the same ‘features’ 
are to be preserved in every run of the algorithm. We show that the variance, 
particularly at the edges, is bounded by a “small” quantity. 

For the specific form of D(-) that we choose, the variance of the increment 
^(n-i-i)(j) _ at edge points (where \ Xj{n) — X^{n) |> K) is given by 

/XRn) - W(n)\^”" 

Var {Xiin + 1) - X^(n)) < ^ k 

j€Af{i) A / 

where n is any positive integer. 

Thus the algorithm preserves the same features in different runs although 
each run produces a slightly different result. 
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3 Properties of the stochastic algorithm 

In this section, we state some properties of the stochastic algorithm which justify 
its application for a multiscale representation. Proofs are omitted here and may 
be found in [6]. 

— Property 1: (Maximum principle) No local maximum is increased and no 
local minimum is decreased. 

— Property 2 The algorithm is well-posed in an expected sense. 

Weickert[7] has shown that Lipschitz-continuity of the evolution matrix is 
sufficient for well-posedness. This can be proven in the Li norm 

Pii = E|sj 

ij 

Note 1. Well-posedness fails for the nonlinear diffusion equation proposed 
by Perona and Malik because if the gradient value exceeds the threshold K, 
the equation behaves as an inverse diffusion equation. However, in the semi- 
discrete case, we use a finite difference rather than the gradient value and 
preserve Lipschitz continuity in the discrete norm. This is also the reason 
why some authors [7] note that a discretization on a finite lattice provides 
sufficient regularization for the Perona-Malik equation. 

— Property 3 The Markov chain X{-) has a unique stationary distribution 
It can be shown [6] that every state for which maxj X^{-) — min^ ^^(-) > 1 
is a transient state. The unique absorbing state of the system is the constant 
image which has the same number of particles at all pixels. The constant 
image is obtained if and only if the sum of gray values in the image is an 
exact multiple of the number of pixels. If such a state is not possible, the 
algorithm converges to an “almost constant” image where the maximum 
difference between the gray level at any two pixels is 1. 

4 Results and Discussion 

The stochastic algorithm has been tested on several real images. It has been 
found that the algorithm is able to preserve the sharpness of boundaries while 
smoothing region interiors effectively. Results on the cheetah image (Figure 1) 
show that the algorithm correctly identifies the spots of the cheetah as part of the 
region interiors and smooths them in preference to inter-region smoothing. This 
is in spite of the fact that no textural features have been used. The stochastic 
solution also gives almost segmented regions. 

The images at multiple resolutions are obtained, as in other PDF formalisms, 
by stopping the evolution at various times. The difference is that the scale space 
generated by this algorithm is stochastic in nature, meaning thereby that the 
image approximated at any scale is obtained through a stochastic evolution. 
Hence, different runs of the algorithm could presumably result in slightly dif- 
ferent images. However, experimentally, there was no perceivable difference in 
different runs. 
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Original image 




After 100 iterations 

Fig. 1. Results 




After 50 iterations 




After 200 iterations 
cheetah image 
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Fig. 2. Results on telephone booth image 
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Abstract. In this paper we address two important problems in motion 
analysis: the detection of moving objects and their localization. Statis- 
tical and level set approaches are adopted in order to formulate these 
problems. For the change detection problem, the inter-frame difference 
is modeled by a mixture of two zero-mean Laplacian distributions. At 
first, statistical tests using criteria with negligible error probability are 
used for labeling as many as possible sites as changed or unchanged. All 
the connected components of the labeled sites are seed regions, which 
give the initial level sets, for which velocity fields for label propagation 
are provided. We introduce a new multi-label fast marching algorithm for 
expanding competitive regions. The solution of the localization problem 
is based on the map of changed pixels previously extracted. The bound- 
ary of the moving object is determined by a level set algorithm, which is 
initialized by two curves evolving in converging opposite directions. The 
sites of curve contact determine the position of the object boundary. For 
illustrating the efficiency of the proposed approach, experimental results 
are presented using real video sequences. 



1 Introduction 

Detection and localization of moving objects in an image sequence is a crucial 
issue of moving video [II], as well as for a variety of applications of Computer 
Vision, including object tracking, fixation and 2-D/3-D motion estimation. This 
paper deals with these two problems for the case of a static scene. 

Spatial Markov Random Fields (MRFs), through Gibbs distribution have 
been widely used for modeling the change detection problem [1], [7] and [9]. 
On the other hand approaches based on contour evolution [5] [2], or on partial 
differential equations are also proposed in the literature. In [3] a three step 
algorithm is proposed including a contour detection, an estimation of the velocity 
field along the detected contours and finally the moving contours are determined. 
In [8], the contours to be detected and tracked are modeled as geodesic active 
contours. 

In this paper we propose a new method based on level set approaches. An 
innovative idea here is that the propagation speed is label dependent. Thus for 
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the problem of change detection, where two labels are characterizing image sites, 
an initial statistical test gives seeds for performing the contour propagation. The 
propagation of the labels is implemented using an extension of the fast marching 
algorithm, named multi-label fast marching algorithm. The change detection 
maps are used for initializing another level set algorithm, based on the spatial 
gradient, for tracking the moving object boundary. For more accurate results 
and for having an automatic stopping criterion, two fronts are propagated in 
converging opposite directions, and they are designed for contact on the object 
boundary, where the spatial gradient is maximum. 

The remainder of this paper is organized as follows. In Section 2 we consider 
the motion detection problem and we propose a method for initially labeling sites 
with high confidence. In Section 3 a new algorithm based on level set approach is 
introduced for propagating the initial labels. In Section 4, we present the moving 
object localization problem, as well as a fast marching algorithm for locating 
the object’s boundary. In order to check the efficiency and the robustness of the 
proposed method, experimental results are presented on real image sequences. 



2 Detection of moving objects 

Let D = {d{x,y) = I{x,y,t+ 1) — I{x,y,t),{x,y) G S} denote the gray level 
difference image. The change detection problem consists of a “binary” label 
0{x,y) for each pixel on the image grid. We associate the random field 0{x,y) 
with two possible events, 0{x,y) = static {or unchanged pixel), and 0{x,y) = 
mobile (or changed pixel). Let PD\static{d\static) and pD\rnobiie{d\mobile) be the 
probability density functions of the observed inter- frame difference under the two 
hypotheses. These probability density functions are assumed to be homogeneous, 
i.e. independent of the pixel location, and usually they are under Laplacian 
or Gaussian law. We use here a zero-mean Laplacian distribution function to 
describe the statistical behavior of the pixels for both hypotheses. Thus the 
probability density function is a mixture of Laplacians, for which the principle 
of Maximum Likelihood is used to obtain an estimate of its parameters ([4], [6]). 

An initial map of labeled sites is obtained using statistical tests. The first 
test detects changed sites with high confidence, that is with small probability of 
false alarm. Then a series of tests are used for finding unchanged sites with high 
confidence, xpthat is with small probability of non-detection. 

A multi-label fast marching level set algorithm, which is presented in the next 
section, is then applied for all sets of points initially labeled. This algorithm is an 
extension of the well-known fast marching algorithm [10]. The contour of each 
region propagates according to a motion field which depends on the label and 
on the absolute inter-frame difference. The exact propagation velocity for the 
“unchanged” label is vo{x,y) = 1/(1 -|- and for the “changed” 
label vi{x,y) = 1/(1 -I- where n is the number of the 

neighbouring pixels already labeled with the same candidate label, and a takes 
a positive value, if the pixel at the same site of the previous label map is an 
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interior point of a “changed” region, else it takes a zero value. The parameters 

Pi, 9 q, 9 1 and ( are adapted to the data. 

3 Multi-label fast marching algorithm 

The fast marching level-set algorithm introduced by Sethian [10] computes a 
constructive solution to the stationary level set equation \VT{x,y)\ = l/v{x,y), 
where v{x,y) corresponds to the velocity of the moving front, while T{x,y) is a 
map of crossing times. The curves are advanced monotonically according to the 
propagation speed field. 

The proposed multi-label version of the fast marching algorithm solves the 
same problem for the case of any number of independent contours propagating 
with possibly different velocities, which are supposed to “freeze”, when they cross 
over each other. In this approach two properties of each pixel are calculated: the 
arrival time and the region or contour that first reached the specific pixel. 

Our algorithm takes advantage of the fact that the fast marching algorithm 
sweep the pixels in a time-advancing fashion in order to limit redundant recal- 
culations only to the pixels of contact between contours. For each pixel a list of 
label candidacies is maintained. A candidacy can only be introduced by a neigh- 
boring pixel being fixated to a certain label. It follows that no more than four 
candidacies may coexist per pixel. Additionally, multiple candidacies can occur 
in pixels belonging to the border between two labels only, which illustrates the 
fact that multiple recalculations of arrival times are rather scarce. Finally, the 
label carrying the smallest arrival time is selected for every pixel. 

We now present the new multi-label fast marching level set algorithm. 

Initialize 

For each pixel p in decision map 
If decision exists for p 

Set arrival time to zero for p 

For each neighboring pixel q lacking a decision 

- add label of pixel p to list of label candidacies for q, 

- mark it as trial, 

- give an initial estimate of the arrival time 

Else 

Set arrival time to infinity for p 

Propagate 

While trial non alive label candidacies exist 

Select trial candidate c with smallest arrival time 
Mark c as an alive label candidacy 
If no decision exists for pixel p owning c 
Decide for p the label and arrival time of c 
For each undecided neighboring pixel q lacking a candidacy 
for the label of p 

- add label of pixel p to list of label candidacies for q, 
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- mark it as trial 

For each neighboring pixel q containing a trial candidacy d 
for the label of c 

Recalculate arrival time of d 

For the efficient location of the candidacy with the smallest arrival time a 
priority queue is utilized. Pixel candidacies themselves, being up to four, are 
keeped in a linked list for ease of implementation. The above facts indicate 
an execution cost of order A^logfV over the uninitialized pixels. Moreover, in 
practice it is expected to run in no more than twice the time of the traditional 
fast marching algorithms regardless of the actual number of labels used. 

4 Moving Object Localization 

The change detection stage could be used for initialization of the moving object 
tracker. The objective now is to localize the boundary of the moving object. The 
ideal change area is the union of sites which are occupied by the object in two 
successive time instants. It can easily be shown that 

c{t, t+i)n c{t, t-i) = { 0 {i, j, t)} u ({ 0 (z, J, t + 1 )} n t - i)}) 

This means that the intersection of two successive change maps is a better initial- 
ization for moving object localization, than each of them. In addition sometimes 
it is {0{i,j,t)} = C{t,t -I- I) n C{t,t — I). In Fig. I we give the initial position 
of the moving contours for the Trevor White sequence. 




Fig. 1. Detection of Moving Objects: Trevor White 



Knowing that there exist some errors in change detection and that sometimes 
under some assumptions the intersection of the two change maps gives the object 
location, we propose to initialize a level set contour search algorithm by this map. 
This search will be performed in two stages: first, an area containing the object’s 
boundary is extracted, and second, the boundary is detected. 
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The first objective is to determine the area which contains the object’s bound- 
ary with extremely high confidence. Because of errors resulting from the change 
detection stage, and also because of the fact that the initial boundary is, in 
principle, placed outside the object, it is needed to find an area large enough to 
contain the object’s boundary. The task is simplified, if some knowledge about 
the background is acquired. In absence of knowledge concerning the background, 
the initial boundary could be relaxed in both directions, inside and outside, with 
a constant speed, which may be different for the two directions. In this area will 
then the photometric boundary be searched. 

For cases where the background could be easily described, a level set approach 
extracts the zone of object’s boundary. Let us suppose that the image intensity 
on the backround could be described by a Gaussian random variable with mean 
value, fi, and variance, cr^. This model could be locally adapted. For the White 
Trevor sequence used here for illustrating results, a global backgound distribution 
is assumed. 

The speed of uncertain area propagation is dependent on the label given 
by the initialization, and defined for the inner border as Vo = Co + dof{I), 

where /(/) = l/(l-|-e ),/ being the mean value of the intensity in a 

3x3 window centered at the examined point. For the outer border the speed 
is defined as Vb = d&(l — /(d)). Thus for a point on the inner border, if its 
intensity is very different from that of the background, it is advancing with only 
the constant speed Cq. In contrast, the propagation of a point on the outer border 
is decelerated, if its intensity is similar to that of the background. The width of 
the uncertain zone depends on the size of the detected objects. 

The last stage involves determining the boundary of the object based on 
the image gradient. The two extracted boundaries are propagated in opposite 
directions, the inner outside and the outer inside. The boundary is determined 
as the place of contact of the two borders. The propagation speed for both is 
V = 1/(1 -I- The parameters 7 and 9 are adapted to the data. Thus 

the two borders are propagating rapidly in the “smooth” area, and they are 
stopped on the boundaries of the object. In Fig. 2 are given the same frames as 
in Fig. 1 with the final result of localization. 

5 Conclusions 

In this article we propose at first a very interesting extension of the fast marching 
algorithm, in order to be able to consider multiple labels for the propagating 
contours. This allows to have purely automatic boundary search methods, and to 
obtain more robust results, as multiple labels are in competition. We have tested 
the new algorithm into the two stage problem of change detection and moving 
object localization. Of course, it is possible, and sometimes sufficient, to limit 
the algorithm into only one of these stages. This is the case for telesurveillance 
applications, where change detection with a reference frame gives the location 
of the moving object. In the case of a motion tracking application, the stage of 
localization could be used for refining the tracking result. In any case, in this 
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(a) (b) 

Fig. 2. Location of Moving Objects: Trevor White 



article we show that it is possible to locate a moving object without motion 
estimation, which, if it is added, it could improve further the already sufficiently 
accurate results. 
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Abstract. In this paper we present and briefly describe a Windows user- 
friendly system designed to assist with the analysis of images in general, 
and biomedical images in particular. The system, which is being made 
publicly available to the research community, implements basic 2D image 
analysis operations based on partial differential equations (PDE’s). The 
system is under continuous development, and already includes a large 
number of image enhancement and segmentation routines that have been 
tested for several applications. 



1 Introduction 

Partial differential equations (PDE’s) are being used for image processing in 
general, and biomedical image processing in particular, with great success. The 
goal of this paper is to present a user friendly system developed under Windows 
NT/95/98 that implements and extends some of the most popular and useful 
algorithms based on this technique. The software package is in the process of 
being made publicly available to the research community. 

Some of the algorithms included in the system are: (a) Anisotropic diffu- 
sion [3,13]; (b) Curvature-based diffusion [1]; (c) Coherence enhancement [20]; 
(d) Vector-valued PDE’s [6,20]; (e) Geodesic active contours [5,10,14]; (f) Edge 
tracing [7,19]; (g) Fast numerics [15]. Both the original algorithms and new im- 
provements have been implemented. 

As a form of example, we will describe two groups of operations implemented 
in the system, image enhancement and image segmentation, and during the 
conference we will demonstrate the system with a number of examples from 
different imaging modalities. 

* This work was supported by a grant from the Office of Naval Research ONR-N00014- 
97-1-0509, the Office of Naval Research Young Investigator Award, the Presidential 
Early Career Awards for Scientists and Engineers (PECASE), the National Science 
Foundation CAREER Award, and the National Science Foundation Learning and 
Intelligent Systems Program (LIS). 
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The whole system is designed with two goals in mind: First, to assist re- 
searchers in the analysis of their data, and second, to allow its constant ex- 
pansion and the introduction of new algorithms. Only a subgroup of the basic 
algorithms implemented in the system are here described, while it is understood 
that the package includes a large number of user friendly options that will be 
demonstrated at the conference. Details on the algorithms can be found in the 
mentioned references. 



2 Image enhancement 

Both for analysis and visualization, it is imperative to enhance images. This is 
of particular importance in biomedical images. Our software package includes a 
large number of PDF’s based image enhancement procedures, both for scalar and 
vectorial (e.g., color) data. We have included directional diffusion, anisotropic 
diffusion, color anisotropic diffusion, coherence enhancement, and vectorial diffu- 
sion. We proceed to describe two of the algorithms implemented in the package. 

2.1 Directional (curvature-based) diffusion 

The algorithm in this section follows [1]. 

Let l{x,y,0) : — >■ 5? be the original image that we want to enhance. 

The basic idea behind image enhancement via directional diffusion is to define 
a family of images I{x, y, t) : x [0, t) ^ Ft satisfying 

dl 

^^-<?(I1V II) ^^2, 

where g{r) -^r^oo 0 is an edge stopping function, and ^ is a unit vector perpen- 
dicular to VI. This flow is equivalent to 

f^=9{\\ V/||)«|| VI II, 

where k is the curvature of the level-sets of I. The flow is processing the image 
in the direction of its edges, hereby preserving the basic edge information. 

2.2 Robust auisotropic diffusiou 

The algorithm in this section follows [3,13]. 

One of the most popular PDF’s based algorithms for image enhancement is 
the anisotropic diffusion scheme pioneered by Perona and Malik. Our system 
includes these equations and the later improvements developed by Black et al. 
Letting I{x,y,t) : x [0,r) M he the deforming image, with the original 

image as initial condition, the image enhancement flow is obtained from the 
gradient descent of 

[ P{\\ VI\\)dQ, 

Jo 
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which is given by 



dl 

Ih 



= div p'di V/ 



VI 



where p is, for example, the Lorentzian or Tukey’s biweight robust function. 



3 Segmentation 

One of the most commonly used approaches to segment objects, particularly in 
medical images, are active contours or snakes [8,17]. This technique is based on 
deforming a curve toward the minimization of a given energy. This energy is 
mainly composed by two terms, one attracting the curve to the objects bound- 
aries, and the other one addressing regularization properties of the deforming 
curve. In [4,5], it was shown that a re-interpretation of the classical snakes model 
leads to the formulation of the segmentation problem as the minimization of a 
weighted length given by 

jm v(i) ii))ds, (1) 

where C : M ^ is the deforming curve, I : — >■ M the image, ds stands 

for the curve arc-length (jj dC/ds ||= 1), V(-) stands for the gradient, and 
g{-) is such that g{r) — >• 0 while r — >• oo (the “edge detector”). This model 
means that finding the object boundaries is equivalent to computing a path of 
minimal weighted distance, a geodesic curve, with weight given by g{-) (see also 
[10,16,21]). This model not only improves classical snakes, but also provides a 
formal mathematical framework that connects between previous models (e.g., 
between [8] and [11]); see [5] for details. 

There are two main techniques to find the geodesic curve, that is, the min- 
imizer of (1). Both are part of the system we have developed, and are briefly 
described now. 



3.1 Curve evolution approach 

The algorithm in this section follows [5,10,14]. 

This technique is based on computing the gradient descent of (1), and starting 
from a closed curve either inside or outside the object, deform it toward the 
(possibly local) minima, finding a geodesic curve. This approach gives a curve 
evolution flow of the form 



— = gnM - {Vg ■ N)M, (2) 

where k and JV are the Euclidean curvature and Euclidean unit norm respectively 
(additional velocities can be added as well and they are part of our implementa- 
tion). This was the approach followed in [5,10], inspired by [11], where the model 
was first introduced. The implementation is based on the numerical technique 
developed by Osher and Sethian [12]. This model gives a completely automatic 
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segmentation procedure (modulo initialization). This approach works very well 
for images that are not extremely noisy. For extremely noisy images, like the 
neuron data presented in the examples section, spurious objects are detected, 
and is left to the user to manually eliminate them. In addition, since the bound- 
ary might be very weak, this is not always detected. An initialization very close 
to the goal might then be required. This motivates the next approach. 



3.2 Geodesic edge tracing 

The algorithm in this section follows [7,19]. 

This technique of solving (1) is based on connecting between a few points 
marked by the user on the neuron’s boundary, while keeping the weighted length 
(1) to a minimum. This was developed in [7]. In contrast with the technique 
described above, this approach always needs user intervention to mark the initial 
points. On the other hand, for very noisy images, it permits a better handling 
of the noise. 

We now describe the algorithm used to compute the minimal weighted path 
between points on the objects boundary. That is, given a set of boundary points 
and following (1), we have to find the N curves that minimize {Vn+i = 
Vi) 

d(/(P,),/(P,+i)) := / VI ||)ds. (3) 

JVi 

The algorithm is composed of three main steps: 1- Image regularization, 
2- Computation of equal distance contours, 3- Back propagation. We briefly 
describe each one of these steps now. 



Image regularization As in the curve evolution approach, the image is first 
enhanced (noise removal and edge enhancement), using the PDF’s based algo- 
rithms described before. The result of this step is the image I (working on the 
subsampled data, following [19], is part of the software package as well). 



Equal distance contonrs computation After the image I is computed, we 
have to compute, for every point Vi, the weighted distance map, according to 
the weighted distance d. That is, we have to compute the function 

V,{x,y) := d{i{Vi),i{x,y)), 

or in words, the weighted distance between the pair of image points Vi and (x,y). 

There are basically two ways of making this computation, computing equal 
distance contours, or directly computing T>i. We briefly describe each one of 
these now. 

Equal distance contours Ci are curves such that all the points on the contour 
have the same distance d to Vi- That is, the curves Ci are the level-sets or 
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isophotes of Vi. It is easy to see, [7], that following the definition of d, these 
contours are obtained as the solution of the curve evolution flow 



dC^{x,y,t) 

dt 



ffdl v/||) 






where Af is in this case the outer unit normal to Ci{x,y,t). This type of flow 
should be implemented using the standard level-sets method [12]. 

A different approach is based on the fact that the distance function Vi holds 
the following Hamilton- Jacobi equation [9,15,18]: 



1 

5(11 V/||) 



VV, 11 = 1 . 



Optimal numerical techniques have been proposed to solve this static Hamil- 
ton-Jacobi equation [9,15,18]. Due to this optimality, this is the approach we 
follow in our software package. At the end of this step, we have Vi for each point 
Vi- We should note that we do not need to compute Vi for all the image plane. It 
is actually enough to stop the computations when the value at Vi+i is obtained. 



Back propagation After the distance functions Vi are computed, we have to 
trace the actual minimal path between Vi and Vi+i that minimizes d. Once again 
it is easy to show (see for example [9,15]), that this path should be perpendicular 
to the level-curves Ci of Vi, and therefore tangent to Wi. The path is then 
computed backing from Vi+i, in the gradient direction, until we return to the 
point Vi- This back propagation is of course guaranteed to converge to the point 
Vi, and then gives the path of minimal weighted distance. We have implemented 
both a full back propagation scheme and a discrete one that just looks at the 
neighboring pixels. 

4 Concluding remarks 

In this paper we introduced a system for image analysis via PDE’s. Some of 
the algorithms implemented in our package have been shown to outperform 
commercially available packages that perform similar operations. For example, 
we have shown, [19], that the edge tracing algorithm normally outperforms the 
one in PicturelT, Microsoft’s image processing package. As mentioned in the 
introduction, the system will be available to the research community. The system 
is under constant development, and additional features, like an improvement of 
the tracking scheme introduced in [2], are expected to be available soon. 
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Abstract. Segmentation based on color, instead of intensity only, pro- 
vides an easier distinction between materials, on the condition that ro- 
bustness against irrelevant parameters is achieved, such as illumination 
source, shadows, geometry and camera sensitivities. Modeling the phys- 
ical process of the image formation provides insight into the effect of 
different parameters on object color. 

In this paper, a color differential geometry approach is used to detect 
material edges, invariant with respect to illumination color and imaging 
conditions. The performance of the color invariants is demonstrated by 
some real-world examples, showing the invariants to be successful in 
discounting shadow edges and illumination color. 



1 Introduction 

Golor is a powerful clue in the distinction between objects. Segmentation based 
on color, instead of intensity only, provides an easier discrimination between col- 
ored regions. It is well known that values obtained by a color camera are affected 
by the specific imaging conditions, such as illumination color, shadow and ge- 
ometry, and sensor sensitivity. Therefore, object properties independent of the 
imaging conditions should be derived from the measured color values. Modeling 
the physical process of the image formation provides insight into the effect of 
different parameters on object color [4,5,10,12]. We consider the determination 
of material changes, independent of the illumination color and intensity, camera 
sensitivities, and geometric parameters as shadow, orientation and scale. 

When considering the estimation of material properties on the basis of local 
measurements, differential equations constitute a natural framework to describe 
the physical process of image formation. A well known technique from scale-space 
theory is the convolution of a signal with a derivative of the Gaussian kernel to 
obtain the derivative of the signal [8] . The introduction of wavelength in the scale- 
space paradigm leads to a spatio-spectral family of Gaussian aperture functions, 
introduced in [2] as the Gaussian color model. As a result, measurements from 
color images of analytically derived differential expressions may be obtained by 
applying the Gaussian color model. Thus, the model defines how to measure 
material properties as derived from the photometric model. 



M. Nielsen et al. (Eds.): Scale-Space’99, LNCS 1682, pp. 459—464, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




460 



J.-M. Geusebroek et al. 



In this paper, the problem of determining material changes independent of 
the illumination color and intensity is addressed. Additionally, robustness against 
changes in the imaging conditions is considered, such as camera viewpoint, illu- 
mination direction and sensor sensitivities and gains. The problem is approached 
by considering a Lambertian reflectance model, leading to differential expressions 
which are robust to a change in imaging conditions. The performance of these 
color invariants is demonstrated on a real-world scene of colored objects, and on 
transmission microscopic preparations. 



2 Determination of Object Borders 

Any method for finding invariant color properties relies on a photometric model 
and on assumptions about the physical variables involved. For example, hue and 
saturation are well known object properties for matte, dull surfaces, illuminated 
by white light [5]. Normalized rgb is known to be insensitive to surface orien- 
tation, illumination direction and intensity, under a white illumination. When 
the illumination color varies or is not white, other object properties which are 
related to constant physical parameters should be measured. In this section, 
expressions for determining material changes in images will be derived, under 
the assumption that the scene is uniformly illuminated by a colored source, and 
taking into account the Lambertian photometric model. 

Consider a homogeneously colored material patch illuminated by incident 
light with spectral distribution e(A). When assuming Lambertian reflectance, the 
reflected spectrum by the material in the viewing direction v, ignoring secondary 
scattering after internal boundary reflection, is given by [7,13] 

A(A) = e(A) (1 - pf(n, s,u))^i?oo(A) (1) 

where n is the surface patch normal and s the direction of the illumination 
source, and pf the Fresnel front surface reflectance coefficient in the viewing 
direction, and i?oo denotes the body reflectance. 

Because of projection of the energy distribution on the image plane vectors 
n, s and v will depend on the position at the imaging plane. The energy of the 
incoming spectrum at a point x on the image plane is then related to 

E{\, x) = e(A, x) (1 - pf{x)f i?oo(A, x) (2) 

where the spectral distribution at each point x is generated off a specific material 
patch. 

Consider the photometric reflection model (2) and an illumination with lo- 
cally constant color. Hence, the illumination may be decomposed into a spectral 
component e(A) representing the illumination color, and a spatial component 
i{x) denoting the illumination intensity, resulting in 

E{X,x) = e{X)i{x) (1 - pf(a;))^ R^{X,x) . 



(3) 
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The aim is to derive expressions describing material changes independent of the 
illumination. Without loss of generality, we restrict ourselves to the one dimen- 
sional case; two dimensional expressions will be derived later. The procedure of 
deriving material properties can be formulated as finding expressions depending 
on the material parameters in the given physical model only. 

Differentiation of (3) with respect to A results in 

f) W f)p f) R 

— = - p,{x)rR^{X,x)- + e(A)z(x)(l - • (4) 

Dividing (4) by (3) gives the relative differential, 

1 dE _ 1 de 1 (9i?oo 

i?oo(A,x) dX ■ 

The result consists of two terms, the former depending on the illumination color 
only and the latter depending on the body reflectance. Since the illumination 
depends on A only, differentiation to x yields a reflectance property. 



Lemma 1. Assuming matte, dull surfaces and an illumination with locally con- 
stant color, 

A 



determines material changes independent of the viewpoint, surface orientation, 
illumination direction, illumination intensity and illumination color. 

Proof. See (4) — (5). Further, the reflectivity i?oo smd its derivative with re- 
spect to A depend on the material characteristics only, that is on the material 
absorption- and scattering coefficient. Hence, the spatial derivative of their prod- 
uct is determined by material transitions. □ 

Note that Lemma 1 holds whenever Fresnel (mirror) reflectance is neglectable, 
thus in the absence of interreflections and specularities. The expression given by 
(6) is the fundamental lowest order illumination invariant. Any spatio-spectral 
derivative of (6) inherently depends on the body reflectance only. According to 
[11], a complete and irreducible set of differential invariants is obtained by taking 
all higher order derivatives of the fundamental invariant. 



Proposition 2. Assuming matte, dull surfaces and an illumination with locally 
constant color, N is a complete set of irreducible invariants, independent of the 
viewpoint, surface orientation, illumination direction, illumination intensity and 
illumination color. 



Qn+m J' 1 

5A"9x™ I F; (9A j 



(7) 



for m > 1, n > 0. 

These invariants may be interpreted as the spatial derivatives of the normalized 
slope (Nx) and curvature {N\x) of the reflectance function i?oo- 
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3 Measurement of Spatio-Spectral Energy 



So far, we have established invariant expressions describing material changes 
under different illuminations. These are formal expressions, assumed to be mea- 
surable at an infinitesimal small spatial resolution and spectral bandwidth. The 
physical measurement of electro-magnetic energy inherently implies integration 
over a certain spatial extent and spectral bandwidth. In this section, physically 
realizable measurement of spatio-spectral energy distributions is described. We 
emphasize that no essentially new color model is proposed here, but rather a 
theory of color measurement. The specific choice of color representation, often 
referred to as color coordinates or color model, is irrelevant for our purpose. 

Let E{\) be the energy distribution of the incident light, and let G(Ao; n\) be 
the Gaussian at spectral scale cj\ positioned at Aq . Measurement of the spectral 
energy distribution with a Gaussian aperture yields a weighted integration over 
the spectrum. The observed energy in the Gaussian color model, at infinitely 



small spatial resolution, approaches in second order to [2,9] 

(A) = -b ^ -b . . . (8) 

= J E{X)G{X;Xo,ax)dX (9) 

^ J E^X)Gx{X;Xo,ax)dX (10) 

= J E{X)Gxx{X;Xo,ax)dX (11) 



were Gx{.) and G\\{.) denote derivatives of the Gaussian with respect to A. 

Definition 3. The Gaussian color model measures, up to the 2^'^ order, the 
coefficients E^x'^^ of the Taylor expansion of the Gaussian 

weighted spectral energy distribution at Aq [9]. 



Introduction of spatial extent in the Gaussian color model yields a local 
Taylor expansion at wavelength Aq and position Xq [2]. Each measurement of a 
spatio-spectral energy distribution has a spatial as well as spectral resolution. 
The measurement is obtained by probing an energy density volume in a three- 
dimensional spatio-spectral space, where the size of the probe is determined by 
the observation scale ax and Ox, 



E{X, x) = E + 




1 




^xx ^xX 


2 


[y 


E\x E\\_ 




(12) 



where 



Ex'\i (A, x) = E(X, x) * GxixJ (A, x; ax) 



(13) 
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Here, G^ixi (A, x; cjx) are the spatio-spectral probes, or color receptive fields. The 
coefficients of the Taylor expansion of if(A, x) represents the local image struc- 
ture completely. Truncation of the Taylor expansions results in an approximate 
representation, which is best possible in the least squares sense [8]. 

For human vision, the Taylor expansion is spectrally truncated at second 
order [6]. Hence, higher order derivatives do not affect color as observed by the 
human visual system. The Gaussian color model approximates the Hering basis 
for human color vision when taking the parameters Aq — 515 nm and a\ ~ 55 nm 
[2] . Again, this approximation is optimal in least square sense. 

For an RGB camera, principle component analysis of all triplets results in 
a decomposition of the image independent of camera gains and dark-current. 
The principle components may be interpreted as the intensity of the underlying 
spectral distribution, and the first- and second-order derivative, describing the 
largest and one but largest variation in the distribution. Hence, the principal 
components of the RGB values denote the spectral derivatives as approximated 
by the camera sensor sensitivities. 

Concluding, measurement of spatio-spectral energy implies probing the en- 
ergy distribution with Gaussian apertures at a given observation scale. The hu- 
man visual system measures the intensity, slope and curvature of the spectral 
energy distribution, at fixed Aq and fixed a\. Hence, the spectral intensity and 
its first and second order derivatives, combined in the spatial derivatives up to 
a given order, describe the local structure of a color image. 



4 Results 



Geometrical invariants are obtained by combining the color invariants N\ and 
N\x in the polynomial expressions proposed by Florack et al. [3]. For example, 
the first order spatial derivatives yields the edge detectors 

Nxx'^ + Nxy^ and Nxxx"" + Nxxy^ ■ (14) 



Figure la-c shows the result of applying the edge detector y Nxx'^ + ^Xy'^ under 
different illuminants. 

Color edges can be detected by examination of the directional derivatives in 
the color gradient direction [1], by solving for 

_ ^Xy^^\yy + “^^XyNxx^Xxy + ^Xx^^Xxx _ „ 

" Nxx^ + Nxy^ " 

Nxw = \j Nxx"^ + 



and similar for Nxx ■ Salient edges are determined by the value of a. An example 
is shown in Fig. Id. 
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Fig. 1. Illumination invariant edges for epithelial tissue (a) visualized by transmission 
light microscopy. Edges N\^ = \J + N\y^ are shown for (b) a white illumination 
(halogen 3400K), and (c) a reddish illumination (halogen 2450K). Despite the different 
illuminants, edge strength is comparable. Figure d shows zero crossing detection in an 
image of colored objects. In white the crossings (bluish- yellow edges), in black 

the N\\^^ crossings (reddish-green edges). 
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Abstract. This paper introduces a new approach for computing a hi- 
erarchical aspect graph of curved objects using multiple range images. 
Characteristic deformations occur in the neighborhood of a cusp point 
as viewpoint moves. We analyze the division types of viewpoint space in 
scale space in order to generate aspect graphs from a limited number of 
viewpoints. Moreover, the aspect graph can be automatically generated 
using an algorithm of the minimization criteria. 



1 Introduction 

Koenderink and van Doom introduced the notion of aspect graphs for repre- 
senting an object shape [KD]. An aspect is defined as a qualitatively distinct 
view of an object as seen from a set of connected viewpoints in the viewpoint 
space. Every viewpoint in each set gives a qualitatively similar projection of the 
object. In an aspect graph, nodes represent aspects and arcs denote visual events 
connecting two aspects. It is possible to compute the aspect graph by deriving 
the exact partition of viewpoint space from its geometric model for that of poly- 
hedral objects [GC]. Many researchers have shown an interest in visual events 
and boundary viewpoints for piecewise-curved objects [RI][PK]. 

A panel discussed the theme “Why Aspect Graphs Are Not (Yet) Practical 
for Gomputer Vision” [FA] . One issue raised by the panel is that aspect graph 
research has not included any notion of scale. As an object’s complexity increases, 
the aspect number also increases. Therefore, this method can only be applied to 
simple objects, such as solids of revolution. If an object is complex, the size of its 
aspect graph is too big to match an object. By introducing the concept of scale, 
it is hoped that this large set of theoretical aspects can be reduced to a smaller 
set. From this viewpoint, Eggert has proposed the scale space aspect graph 
for polyhedra [EB] . These approaches address the case of a camera having finite 
resolution [SP] [EB]. From the same viewpoint, we proposed a method to generate 
a hierarchical aspect graph using silhouettes of curved objects [MK]. The strict 
direction of the objects cannot be matched using only silhouettes, though objects 
can be matched quickly. Thus we proposed a method for generating a hierarchical 
aspect graph using multiple range data for curved objects. 

2 Primitive techniques and overview of aspect analysis 

The curvatures used here are defined as follows. 
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Suppose a parametric form of a surface X{u,v) = (x{u,v),y{u,v), z{u,v)). 
A tangential line at X{u,v) is denoted by t{u,v) = duXu{u,v) + dvXy{u,v). 
The curvature at X along {du, dv) is defined as \{du, dv) = 



where <5i (du, dv) = [du dv] 
S 2 {du, dv) = [du dv] 



X„ = 



dX 



= 



XuuVj XuvVj 
XuvVi XuyTi 

d'^X 



f du 
y dv 
du 
dv 
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O 7 -^uu — 
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du^ 
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d'^X 

dvdu 



A„ 



d^X 

dv'^ 



With the directional vectors which maximize and minimize the curvature 
at the point p as (u,v)=(Ci, 771 ) and (C 2 j? 7 i)) the maximum curvature ki, the 
minimum curvature K 2 , the mean curvature H, and the Gaussian curvature H 
are defined as: 

Ki = HCi,Vi),K 2 = HC 2 ,V 2 ),H = ^ = i^iK 2 , respectively. 

Characteristic contours which satisfy H = = 0, and K = k\K 2 = Q are 

called HO contours and KO contours, respectively. 

Scale-space filtering is a useful method for analyzing a signal qualitatively by 
managing the ambiguity of scale in an organized and natural way. In this section, 
we extend the scale-space filtering for 2-D contour to 3-D surface analysis. The 
Gaussian convolution to the surface </> is formulated by a diffusion equation: 



^ 1 

du'^ dv'^ t dt'"^ ’ 



where 4>{u,v) = (x{u,v),y{u,v), z{u,v)) is a parametric representation of a 
surface. This equation is approximated by the difference equation: 



<fi{u, v,t + At) 



<j){u, V, t) -t 



Au, V, t) — 2(f){u, V, t) + 4>{u + Au, V, t) 
Au^ 



^^<j}{u,v- Av,t)-2cj){u,v,t) + (f){u,v + Av,t) 

Iterating (2), the curvature at each sample point converges to a constant. 

Figure 1 shows the primitive causing the unique changes in the topologies 
of the zero-crossing surface, when the shape is smooth. Diagrams la and 2a 
in Figure 1 show the vertical section of the zero-crossing surface in which u 
coordinate has the constant values ul and u2. and figure lb and 2b of Figure 
1 show the horizontal section of the zero-crossing surfaces in which scale t has 
the constant values tl and t2. Diagrams la and lb in Figure 1 show the changes 
occurring when a zero-crossing surface comes into contact with another surface. 
Diagrams 2a and 2b show the changes occurring when a zero-crossing surface 
comes into existence. Diagrams la and lb in Figure 1 show the non-monotonous 
causes of scale-space [YP]. As the scale t decreases, the surface first opens in the 
top level, then closes in the next step, and later appears again. 

We generate a hierarchical aspect graph from orthographic projections ob- 
tained from a limited number of viewpoints. If the viewpoint space (a[0,360], 
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Fig. 1. The primitive causes topology changes of zero-crossing surfaces. 

0[O, 360]) is divided into pieces of size (^, ^), then the number of sample 
viewpoints is n/2 * n{n = 2,4, 6,8, 10...). The number of sample viewpoints is 
36*18 (n=10) in this paper. 

3 The Hierarchical Partition Method of a Viewpoint 
Space 

3.1 Aspect Analysis of Occluding Contour 

If the observed image is the same, the classification of the orthographic image 
does not change, as the differential geometric characters, which are the mean 
curvature and Gaussian curvature, do not change. 

The orthographic image deforms in the following two cases. In the first case, 
a surface occluded by other surfaces appears. In the second case, a surface is 
occluded by the other surfaces. In these two cases the differential geometrical 
continuity isn’t satisfied and occluding contours come into existence. 

We analyze aspect changes in the neighborhood of cusp points where oc- 
cluding contours occur. This is important in the case of describing an aspect, 
such as the contour topology which includes occluding contours. By convolving 
the depth image of the orthographic projection Gaussian, a unique deformation 
process occurs at this discontinuity point. The appearance of occluding contours 
has three prototypes [THjdips, beaks and swallows. We partition the viewpoint 
space using the unique events from the observation of limited viewpoints. The 
partitioning of the viewpoint is reliable because it is restricted by the unique 
events, where occluding contours occur. 

3.2 Hierarchical events 

Every viewpoint in a viewpoint space can be classified into two different types: a 
stable viewpoint or an accidental viewpoint. For stable viewpoints, there exists 
an open neighborhood of viewpoints that gives the same aspect of the object. 

In partitioning viewpoint space into aspects, the boundary between two as- 
pects is called an event. Each visual event type can be characterized by alter- 
ations in the feature configurations. As the number of processes increase, an 
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aspect is partitioned into finer aspects. The deformation processes in a deforma- 
tion number differ in the boundary between two scales. The boundary between 
two aspects and between two scales is called a hierarchical event. In order to 
generate a hierarchical aspect graph automatically from depth maps of limited 
observed viewpoints and scales, we must analyze the hierarchical and visual 
events. The number of deformation processes is the applied number of event. 

The number of different hierarchical events is finite, and they depend on the 
difference in zero-crossing contour topology of scale-space images A hierarchical 
event occurs when the following occurs in scale-space images. 

Type 1. A zero-crossing surface comes into contact with another surface. 

Type 2. The singularity of two zero-crossing surfaces is at the same height. 

Type 3. A zero-crossing surface disappears. 

Figure 2(a) (b) shows three types of hierarchical events. aO represents a zero- 
crossing surface. Lines of aO illustrate the intersection of a zero-crossing surface, 
where the u coordinate is constant and the dotted lines of aO represent the 
intersection of a zero-crossing surface, where the t coordinate is zero, al, a2 and 
a3 are zero-crossing contours in the u, and v coordinates. The label al changes 
a3, as scale t decreases. 

Type 1-a and 1-b show the difference of singularity heights. A and C are two 
stable views, and B is an accidental view. One zero-crossing surface is higher 
than the other in A, but lower than the other zero-crossing surface in C, which 
means A and C differ in the order of deformation. B deforms in two places at 
once. A zero-crossing surface does not exist between the two surfaces in 1-a, but 
does in 1-b. 

Type 2-a and 2-b show a zero-crossing surface which comes into contact 
with another contour in a scale-space image. Type 2-a shows two zero-crossing 
surfaces which are inscribed, and type 2-b shows two zero-crossing surfaces which 
are circumscribed. Since the topology of the zero-crossing surfaces are the same 
in the first frame aO of type 2-a, the viewpoint space belongs to an aspect. 
However, in the next deformation process, a zero-crossing surface comes into 
contact with another surface, and a difference occurs. A and C are two stable 
views, and B is an accidental view. A zero-crossing surface contacts with another 
zero-crossing surface accidentally. B is the boundary of the two aspects A and 
C and the event. 

Type 3-a and 3-b show a zero-crossing surface which disappears in a scale- 
space image. If the outline is a circle, it will never deform. As the surface is 
smooth, it does not deform without reaching the comparable process number. 
Cl and C2 are two stable views and B is the accidental view. The zero-crossing 
surface disappears in the viewpoint B. 

Hierarchical events can be classified into these three events, depending on the 
properties of scale-space. If the projection depth image changes smoothly, then 
the zero-crossing surface also changes smoothly. These events are all considered 
in this paper. This discussion is based on the Morse theory [TH] studying the 
theoretical behavior of extrema and saddle points. 
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3.3 The Partition Types of a Viewpoint Space. 

The partition type and the changes of the partition type are limited by three 
hierarchical events. It is important to analyze the points in which the boundary 
lines of the viewpoint space intersect. Two hierarchical events occur in the inter- 
section point at the same time. If Type 1-b and Type 2-a events in Figure 2(a) (b) 
occurs at the same time, then the two boundary lines dividing the viewing space 
intersect as in Figure 2(c) and Figure 3(a). If Type I-a and Type 2-b events 
occur at the same time, then the two boundary lines dividing the viewing space 
intersect as in Figure 2(c) and Figure 3(a). aO shows the zero-crossing surfaces 
of the nine viewpoints, such as (Bl, B2) and (Cl, C2). The lines inside of nine 
square frames of aO are the intersections of zero-crossing surfaces, where the u 
coordinate is constant. The dotted lines inside of nine square frames of aO are 
the intersections of zero-crossing surfaces, where the t coordinate is zero. The 
label al is the coarse level and a3 is the fine level and they represent the value 
of the scale t. The square frame of aO, al, a2, and a3 represents the viewpoint 
space, which have nine viewpoints. The circle frame means that zero-crossing 
contours occurred in u and v coordinates. In this case, Type2-a and Type 1-b 
occurred, and the aspect of the viewpoint space is the same. In the scale al, 
there is one viewpoint space. In the scale a2, the viewpoint space is divided into 
three, and the events (B2, Cl), (Bl, C2) and (Al, B2) occur. In the scale a3, 
the viewpoint space is divided into four, and the event (B2, Bl) happens. The 
number of the partitions primitive is limited to 15 because the combinations of 
two events ,which we select from 6 hierarchical events. Each event is classified 
in more detail depending on whether the combinations of the zero-crossing sur- 
faces are KO or HO surfaces. The viewpoint space is partitioned using the stable 
viewpoint and the neighboring viewpoints from the limited viewpoints. 

Three hierarchical events happen in the intersection point at the same time. 
However, we don’t use events where three types exist at one viewpoint and scale, 
because they seldom actually occur. A combination of more than three events 
is regarded as a sequential change of two hierarchical events. Thus, all visual 
events are considered. 



4 Generating an Algorithm of Hierarchical Aspect 
Graphs 

Our algorithm to generate an aspect graph can be outlined in the following steps, 
as figure 4 shows the flow chart of the algorithm 

1. We observe an object in 36*18 viewpoints and detect the depth map of 
the orthographic projection using a laser range finder. 

2. We filter in the limited resolutions for each depth map. 

3. The depth map, after the filtering is divided into regions using the signs 
of the mean curvature and Gaussian curvature. 

4. By detecting the topology changes of the KO and HO contours on the 
KH-image from the top level, the zero-crossing surfaces are inferred using KH- 




470 



S. Morita 



images of the limited resolution. Actually the changes of KH-images over scale 
are registered as sets of the primitive operator. 

5. The topology changes of the KO and HO contours is recorded over scale. 
If the topology changes cannot be determined because of a limited number of 
resolutions examined, all deformation processes capable of being obtained from 
observed images are recorded. 

6. The minimum process in the possible deformation processes without in- 
consisentcy in the neighboring viewpoints is selected. The inconsistency is found 
using the partition types of a viewpoint space. 

The dividing map of viewpoint space observed using range sensor are showed 
in Figure 4. y axis is latitude, which is ry = 0° ~ 180°, and x axis is longitude, 
which is C = 0° ~ 360° ((A)t=20, (B)t=90, (C)t=270). 

We analyzed the division types of viewpoint space in the neighborhood of a 
cusp point in order to generate aspect graphs from a limited number of view- 
points. The aspect graph can be automatically generated using an algorithm of 
the minimization criteria. 
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Fig. 2. Hierarchical events. (a) KH-image changes as viewpoint moves. (b)Zero- 
crossing surface changes as viewpoint moves, (c) KH-image changes as viewpoint 
moves. 




Fig. 3. (a) Zero-crossing surface changes as viewpoint moves. (b)Algorithm for 
generating hierarchical aspect graph. 




Fig. 4. The dividing maps of viewpoint space observed using range sensor are 
showed, y axis is latitude, which is ry = 0° ~ 180°, and x axis is longitude, which 
is (^ = 0° ~ 360°. ((A)t=20, (B)t=90, (C)t=270) 
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Abstract. This paper introduces a new approach for generating the hi- 
erarchical description of a non-rigid density object. Scale-space is useful 
for the hierarchical analysis. Process-grammar which describes the defor- 
mation process between a circle and the shape is proposed. We use these 
two approaches to track the deformation of a non-rigid density object. 
We extend 2D process-grammar to 3D density process-grammar. We an- 
alyze the event that the topology of zero-crossing surface of scale-space 
changes, as a density object changes smoothly. The analysis is useful to 
generate 3D density process-grammar from a limited number of obser- 
vations. This method can be used for tracking a non-rigid object using 
MRI data. 



1 Introduction 

Scale-space [Wil] is used to analyze signal sets and contours with several reso- 
lutions. 3D medical data is analyzed using scale-space to obtain the surface of 
the internal organs [VK]. On the other hand, many studies have been devoted 
to the analysis of motion in magnetic resonance images. The most frequently 
used method consists of reconstructing the motion by using a deformable model 
[TW][CC][MT]. The purpose is mainly to find the surface of a non-rigid shape. 

We propose the hierarchical description of the internal structure for a medical 
non-rigid density object. We define a non-rigid density object as an object which 
includes the density and changes in the internal structure of density over time. In 
comparison with surface analysis using the deformable object and the statistical 
segmentation using scale-space, our purpose is not to analyze a surface, but to 
analyze the internal structure of a medical density object. 

Koenderink and van Doom introduced the notion of aspect graphs for rep- 
resenting a shape [KV]. An aspect is defined as a qualitatively distinct view of 
an object as seen from a set of connected viewpoints in the viewpoint space. 
We extend the concept of aspect graph to the concept of a non-rigid density 
object. In this paper, we define an aspect as a qualitatively distinct internal 
structure as a non-rigid density object changes in density. We extend the 2D 
process-grammar [LE] to 3D density process-grammar, in order to generate the 
description of a non-rigid density object. 2D process-grammar describes the de- 
formation process of 2D contour between a circle and the shape. We define 
3D density process-grammar as the deformation process between a flat density 
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and the density. We use the scale-space filtering to generate 3D density process- 
grammar automatically. We analyze the events that the topology of zero-crossing 
surface changes as a density object changes smoothly. The analysis is useful to 
generate 3D density process-grammar from a limited number of observations. 



2 3D Density Scale-space Filtering 

3D density scale-space filtering can be used to analyze a density non-rigid object 
hierarchically. 

Assuming that a mesh has a parametric form, then /(«., v, w) = h{x{u, v, w), 
y{u,v,w), z{u,v,w)) for the density value for the image coordinate x{u,v,w), 
y{u,v,w), z{u,v,w) on the mesh coordinate (u,v,w). A point on the derived 
mesh is the result of the convolution <j){u, v, w, a) = f{u, v, w) *g{u, v, w, a). The 
mesh convolved Gaussian satisfies the following diffusion equation. 

d‘^4> 1 d(j) 

dv? t dt 

This equation can be approximated by the diffusion equation. 



2.1 Hierarchical density analysis based on 3D density scale-space 
filtering 



A mesh 4>{u, v, w) is divided into elements using the positive and negative values 
of the Gaussian curvature K and mean curvature H. 

The curvature used here is defined as follows. Assuming that a mesh has a 
parametric form, X{u,v,w) = (j){u,v,w). 

The curvature at X along (dw, dv, dw) is defined as \{du, dv, dw) = , 

where 

/ A„A„ A„A„ A„A„ \ fdu\ 

Si{du,dv,dw) = (du dv dw) XvXu X^X^ 

yX-ujXu X-iijXy X-ujXyj j \^dw j 



S2{du,dv,dw) = (du dv dw) 



( Xuu 

Xqju 

Xijju 



Xuv 

Xyqj 

X^y 



Xy 

Xy 

Xy, 



du \ 

dv I 
dw ) 



V dX V dX V X V d^ X v 

“ du'^'" ~ du — Qy2 ,Xuv — 

With the directional vectors which maximize and minimize the curvature at 
the points p being defined as {u,v,w) = (?7i,Cij7i) and (?? 2 ,C 2 ) 72 ), then the 
maximum curvature k\, the minimum curvature K 2 , the mean curvature H, and 
the Gaussian curvature H are defined as: 

Ki = A(??i,Ci,7i). K 2 = A(t 72 ,C 2 , 72 ), H = and K = kiK 2 , respectively. 

Gharacteristic surfaces which satisfy H = = 0 and K = kiK 2 = 0 are 
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Fig. 1. The primitive causes topology changes of zero-crossing surfaces. 



called HO surfaces and KO surfaces respectively. We divided the density into 
elements using positive and negative values of the Gaussian curvature K and 
the mean curvature H. This image is termed to a KH-image. Figure 1 shows 
the primitive causing the unique changes in the topologies of the zero-crossing 
surface, when a non-rigid density object changes smoothly. Figure 1(a) shows the 
changes occurring when a zero-crossing surface disappear. Figure 1(b) shows the 
changes occurring when a zero-crossing surface comes into contact with another 
surface. 



3 Hierarchical Event of 3D Density Object Space 

3.1 Hierarchical events 

We describe the changes of aspects on the density object space. In partitioning 
the density object space into aspects, the boundary between two aspects is called 
an event. As the scale increase, an aspect is partitioned into finer aspects. The 
boundary between two aspects and between two scales is called a hierarchical 
event . 

The number of different hierarchical events is finite, and they depend on 
the difference in zero-crossing surface topology of 3-D density scale-space. A 
hierarchical event occurs when the following three types occurs in 3-D density 
scale-space. In type 1, a zero-crossing surface comes into contact with another 
surface. In type 2, the singularity of two zero-crossing surfaces is at the same 
height. In type 3, a zero-crossing surface disappears. Figure 2 (a) (b) shows 
three types of hierarchical events. aO represents a zero-crossing surface. Lines of 
aO illustrate the intersection of a zero-crossing surface, where the u coordinate is 
constant and the dotted lines of aO represent the intersection of a zero-crossing 
surface, where the t coordinate is zero, al, a2 and a3 are zero-crossing contours 
in the u, and v coordinates. The label al changes a3, as scale t decreases. 

Type 1-a and 1-b show the difference of singularity heights. A and C are two 
stable density objects, and B is an accidental density object. One zero-crossing 
surface is higher than the other in A, but lower than the other zero-crossing 
surface in C, which means A and C differ in the order of deformation. 

Type 2-a and 2-b show a zero-crossing surface which comes into contact 
with another contour in a scale-space image. Type 2-a shows two zero-crossing 
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Fig. 2. Hierarchical events. (a)KH-image changes as a density object changes, (b) Zero- 
crossing surface changes as scale increases. (c)Partition changes of a density object 
space. 



surfaces which are inscribed, and type 2-b shows two zero-crossing surfaces which 
are circumscribed. 

Type 3-a and 3-b show a zero-crossing surface which disappears in a scale- 
space image. If the outline is a circle, it will never deform. As the surface is 
smooth, it does not deform without reaching the comparable process number. 
The zero-crossing surface disappears in the density object B. 

By studying the theoretical behavior of extrema and saddle points [TH], it 
is possible to repropriate blob events in four basic types: annihiration, merge, 
split, and creation[CB][LI]. These events are all considered in this paper. 



3.2 The Partition Types of Density Object Space. 

Two hierarchical events occur in the intersection point at the same time. If 
Type 1-b and Type 2-a events in Figure 2(a) (b) occur at the same time, the two 
boundary lines dividing the viewing space intersect as in Figure 2(c). If Type 
1-a and Type 2-b events occur at the same time, then the two boundary lines 
dividing the viewing space intersect as in Figure 4. 

The number of partitions primitive is limited to 15 because of the combi- 
nations of two events ,which we select from 6 hierarchical events. Each event 
is classified in more detail depending on whether the combinations of the zero- 
crossing surfaces are KO or HO surfaces. The density object space is partitioned 
using the stable density object and the neighboring density objects from the 
limited observations. 

We don’t use events where three types exist at a voxel of one density object 
and scale, because they seldom actually occur. A combination of more than three 
events is regarded as a sequential change of two hierarchical events. Thus, all 
events are considered. 
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3.3 3D Density Process-grammar based on Scale-space Topological 
Analysis 

We extend the process-grammar to analyze a 3D density object. All densi- 
ties are transformed from the a flat density in the same way as 2D contours 
are transformed from the circle in 2D process-grammar. The four elements 
M-b, M-, TO—, m+ of 2D process-grammar are defined as the primitive elements. 
The primitive elements of 3D density process-grammar are defined using the four 
elements G > OH > 0,G > OH < 0,G < OH > 0,G < OH < 0. All smoothed 
densities are described using these primitive elements. Just as the relation be- 
tween adjacent primitive elements is described by 2D process-grammar, so is the 
relation between adjacent primitive elements described by 3D density process- 
grammar. The process is described on 3D density process-grammar such as the 
process that the density is transformed from a flat density. 

We generate the deformation process based on the analysis of sections 3.1 
and 3.2 We described the density in terms of the KO blobs only because the 
appearance and disappearance of the HO surface depends on the appearance and 
disappearance of the KO blob. We used 3D density scale-space Altering because 
the density becomes a flat density as scale t increases. When two HO surfaces 
come into contact, the number of surfaces does not decrease monotonously as 
the scale t increases. The KO and HO surfaces contact without affecting the 
multi-resolution when KO and HO blob are very close. For this reason we did not 
use the contact of the KO blob as the deformation process in this paper. This 
discussion is based on the morse theory[TH]. The KO surface of 3D scale-space 
can be generated by linking the KO blobs as scale t increases. After that, the 3D 
density process-grammar is generated by using 3D scale-space. 

We then transformed the set of KO blob to {Mmi, Mm 2 , ■ ■ ■ , Mma+b}- If the 
order satisfies ti > tj, the order reflects the deformation process of 3D density 
process-grammar. Mmt is represented as {g{i),tAi,di). g{i) is the value {xi, yi,Zi) 
on xyz coordinate. The positive and negative values of H are indicated by -1-1 
and —1 using the last element di. The deformation of Mtoi occurs first, followed 
by that of Mm 2 , and subsequently that of Mma+b- 

4 Matching using Geometric Invariance Features 

An object can be matched to another object by using 3D density process- 
grammar hierachically. To relate one feature to a second feature, we use geo- 
metric invariance features. We consider the polyhedron that 3 surfaces contact 
at a vertex. If we assume the coordinate value of the polyhedron rij, the invari- 
ance [RF] of this polyhedron is defined as: 

^ _ detAssei • detNs542 j _ detNisei ■ detNn42 j _ detNz^fyn ■ detN^(,\2 
detN35^4 ■ detNs5i2 ’ detN^5i2 ■ detN^Q42 ’ detN^^^i ■ detN2,b42 

where Nijki = [ni,nj,nk,ni] and which is a 4 x 4 matrix. We use 6 KO blobs 
generated when the scale is high at first. We generate the polyhedron which 3 
surfaces contact at a vertex using 6 blobs. After that, we calculate the geometric 
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invariance obtained from the polyhedron. Thus, the KO blobs for one scale-space 
corresponds to the KO blobs of the another scale-space in turn. Subsequently, 
we evaluate the similarity of the objects by comparing two deformation process 
using the geometric invariance. 



5 Medical Data Analysis 

We show the algorithm for the moving heart analysis using the following steps: 

— 1. A 3D density scale-space is generated from a density object (2.). 

— 2. We transform the 3D scale-space to the 3D density process-grammar using 
topological analysis (3.). 

— 3. We track the change in these points over time. 

— 4. We repeat processes 1 through process 3. 

— 5. We evaluate the similarity of the density objects hierachically using geo- 
metric invariance features of 3D density process-grammar (4.). 

We used the sequential images of a moving heart obtained via MRI. The 
size of the image was 100*100*100. Figure 3(A)(B)(C) show KH-image (z=50), 
(y=50) and (x=50). Figure 3(D) (E)(F) show the zero-crossing surfaces (z=50), 
(y=50) and (x=50) that KO blob which is K > 0,H > 0 are stacked. Fig- 
ure 3(G) shows KH-image which is K > OH > 0, when the iteration num- 
ber is 8, 32, 128, and 496. KH-images on 4 sequential images of a heart are 
shown when the iteration number t is 270. We obtain the deformation process 
{Mmi, Mto 2, Mm3, Mm4 , Mtos, Mto 6, Mmy, Mtos, Mmg} from the 4 hearts 
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sl,s2, s3, s4. First, we calculate a geometric invariance using 6 kO blobs rep- 
resended by Mml — Mm6. After that, we calculate the geometric invariances 
using the sets composed of 6 blobs in turn. The results are shown in Table 1. 
Figure 3(H) show KO blobs, which is K > OH > 0, obtained from 4 sequen- 
tial images. The value of the geometric invariance of 4 sequential objects nearly 
equal, though the value changes slightly over time. 

We analyzed the hierachical event that the topology of zero-crossing surface of 
scale-space changes, as a density object changes smoothly. The analysis is useful 
to generate 3D density process-grammar from limited number of observations. 
The tool is useful for analyzing a non-rigid density object hierachically. 



II 12 13 

si Mml-Mm6 1.060174 0.860518 2.143412 si 
si Mm3-Mm8 1.116745 0.948955 3.939625 si 
s2 Mml-Mm6 1.060174 0.860518 2.143412 s2 
s2 Mm3-Mm8 1.116745 0.948955 3.939625 s2 
s3 Mml-Mm6 1.062561 0.856981 2.143711 s3 
s3 Mm3-Mm8 1.127873 0.934663 3.927846 s3 
s4 Mml-Mm6 1.062669 0.855323 2.136194 s4 
s4 Mm3-Mm8 1.129468 0.933878 3.954875 s4 



II 12 13 

Mm2-Mm7 1.103340 0.927175 3.230862 
Mm4-Mm9 1.114275 0.985219 4.950984 
Mm2-Mm7 1.103340 0.927175 3.230862 
Mm4-Mm9 1.114275 0.985219 4.950984 
Mm2-Mm7 1.113535 0.915727 3.256538 
Mm4-Mm9 1.114944 0.985621 4.999187 
Mm2-Mm7 1.111324 0.903093 3.049251 
Mm4-Mm9 1.114350 0.993526 5.314854 



Table 1. The geometric invariance obtained from 3D density scale-space is calculated 
for 4 sequential images of a moving heart. 
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Abstract. In this paper, we introduce the linear scale-space theory for 
functions on finite graphs. This theory permits us to derive a discrete 
version of the mean curvature flow. This discrete version yields a defor- 
mation procedure for polyhedrons. The adjacent matrix and the degree 
matrix of a polyhedral graph describe the system equation of this poly- 
hedral deformation. The spectral thepry of graphs derive the stability 
condition of the polyhedral deformation. 



1 Introduction 

Mean curvature flow has advantages for shape deformation and signal smoothing 
[1] [2]. The two-dimensional version of mean curvature flow is applied for shape 
analysis by means of the boundary curve and iso-level counter evolutions [2]. 
Bruckstein et al. in [3] proposed a discrete version of mean curvature flow for 
planar closed polygonal curves. Lindeberg [4] proposed a theory of discrete linear 
scale-space for the infinite lattice space. We derive the linear scale-space theory 
for finite graphs. This theory permits us to define the discrete linear scale-space 
theory for grids in finite regions with appropriate boundary conditions. 

In this paper, we derive a theory of discrete mean curvature flow for the higher 
dimensional discrete objects that is applicable for open curves and open surfaces 
with appropriate boundary conditions. We define the diffusion procedure on 
graphs. This theory permits us to derive the scale-space theory for functions on 
graphs and a discrete version of the mean curvature flow. The discrete version of 
this equation extends the treatments in reference [3] for closed polygonal curves 
to the case of polyhedrons and closed and open spatial curves, and we also show 
that our extension contains results in reference [3]. 

2 Discrete Curvature Flow 

A polyhedron consists from a triplet of graphs with indices on vertices which 
topologically have the same structure. These indices express the x, y, and z 
coordinates of vector of a polyhedron, respectively. Setting a finite set of three- 
dimensional vectors V = to be the vertices of a polyhedron, we deflne 

the neighbor of each vertex as 

V(z) = {P*(l),P^(2),•••,P^(fe) l*(fc+ 1) = (1) 
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such that PiQ) and are connected with an edge. 

Using vector p^, we define the edge vector and the face vector as 

=Pi- Pj, f^j = - P^) X (Pi(j+i) - Pi). (2) 

The edge vector is the vector which connects a vertex and a vertex in its neigh- 
borhood. The face vector is the normal vector of a triangle defined by a vertex 
and a pair of vertices in its neighborhood which are connected by an edge. Fur- 
thermore, we define the following vectors, 

i{k) i{k) 

V^= f^= = (p* -Pm) + (Pi -P„), (3) 

i=i(l) j=i(l) 

where p^ and do not lie on the same face. We call Vi, f^, and tmn(*) the vertex 
normal, the face normal, and a path normal of vertex p^. The vertex normal is 
the sum of all edge vectors which are connected to a vertex, the face normal is 
the sum of all normal vectors of the triangles formed by a points and two points 
in the neighborhood which are connected by an edge. The path normal is the 
sum of a pair of edge vectors. From these notations, the following definitions are 
deived. 

Definition 1 The path normal and face normal vectors classify the geometric 
properties of a vertex in each neighborhood as follows. 

1. For all m and n in V(i), if tmn{i)^ fi > 0, then Pi is quasi-convex. 

2. For all m and n in V«, z/tmnW^/, = 0, then Pj is flat. 

3. For all m and n in V(i), if tmn{i)^ fi < 0, then Pi is quasi-concave. 

4 . For all m and n in V(i), if sgntmn{i)^ ft depends on m and n, then Pi is a 

saddle point. 

Since, in the nonlinear diffusion operation, the localization conditions are 
described as a function of the magnitude of the gradient of each point, we define 
the vector operations for the polyhedral vertices. We define the gradient on the 
polyhedral surfaces by 



Vp, = (p,(i)i, • • • ,Pi(k)i)^, Pp = Pj - Pi. (4) 

The total sum of the lengths of the gradients on a surface can be used as a total 
roughness measure of this surface [5]. Therefore, we define the mean curvature 
of vertex p^ using the lengths of the gradients on a polyhedron. 

Definition 2 Setting Vi to be the average of the elements of the gradient at 
vertex Pi, we define a discrete version of the mean curvature on each vertex as 

^ t(k) 



ki = sgn{vj ff) • r*, n 



(5) 
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This definition is used to derive a discrete version of mean curvature flow 

+ 1) -PiW = h{t) X sgn{vj (6) 

where t is the step of the iterations, since ki{t) and Vi{t) /\vi{t)\ are the discrete 
mean curvature of vertex Vi and the outer normal of vertex Vi, respectively. 
Vi is the outward vector if Pi is qusi-convex, otherwise Vi is the inward vector. 
Therefore, sgn{vi{t)^ f^{t))vi{t)/\vi{t) \ determines the outward normal at vertex 

A linearization of eq. (6) is 

P^it + 1) - pM = otv, (7) 

for a nonzero constant a. Here, we deal with eq.(7), and show some topological 
and geometrical properties of this equation. Setting 

= (PoW>PiW> • • • ,Pn-iWV . ^(0) = (Po(0)>Pi(0)> • • • ,Pat-i(0))^ 

( 8 ) 

for Pi = and Pi{Q) = {xi,yi,Zi)^ , we have the matrix notation of 

eq. (7) as 

X{t + l)-X{t) = a{A-D)X{t), (9) 

where A and D are the adjacent matrix and the degree matrix of the graph 
derived by polyhedron V, respectively. Here, X{t) and {A — D) are a, N x 3 
matrix function and a N xN matrix operator, respectively. Therefore, eq. (9) can 
be understood as the curvature flow on a geometric graph in three-dimensional 
Euclidean space. The method deforms polyhedrons, changing the positions of 
vertices of a polyhedron using a linear sum of the edge vectors. 

Setting Xi and Wi to be the eigenvalue of unit length and the corresponding 
eigenvector of L, since L* = L, the unitary matrix W = {wq,Wi, - ■ ■ ,W]\[-i) 
diagonalizes L as LW = WD for diagonal matrix D = diag(Ao, Ai, • • • , Aat_i). 
The singular decomposition of L such that L — WDW* yields equation 

X{t + 1) = W{I+aDYW*X{0), (10) 

for the computation of AC(oo). We assume that the Laplacian matrix is normal- 
ized such that |1 -|- aXi\ < 1 for f = 0, 1, • • • , n — 1, since we deal with a shape in 
a finite region of R^. The geometric version of theorem 1 is as follows. 

Theorem 1 Setting {wq,Wi, ■ ■ ■ ,Wj} such that 1 < 7 < n to &e the eigenvec- 
tors of Laplaceian matrix of a polyhedron which hold the properties |l-|-aAi| = 1, 
for z = 0, 1, • • • , 7, we have 

lim X{t) = WW*X{0), (11) 

t —¥00 

forW= {wo,Wi, - ■ ■ ,w-f). 

Similar to the case of graphs, the process orthogonally projects each e7X(0) to 
the space spanned by Furthermore, this result shows that the analysis 

employed above is not restricted to a graph which represents polyhedrons in R^. 
Theorem 1 holds for any polytope in R”. In this case, X{t) is a N x n matrix, 
each column vector of which indicates the position of a vertex vector. 
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3 Deformation as Signal Processing 

3.1 Matrix and Topology 

If a graph is regular, then the relation D = cl is obvious, where c is a positive 
integer and I is the identity matrix, and matrices A and L have the same 
eigenvectors. Furthermore, if 7 is an eigenvalue of A, then (7— c) is an eigenvalue 
of L. Therefore, in this section, our concern is regular graphs since the degree 
matrix of regular graphs is cl. We write down adjacent matrices which define 
topological properties of geometric objects such as a torus and a sphere. Setting 
m X m matrices TV. 

o\ 

0 



F = 

m. 



and Cm to be 




/O 1 OO--- 1\ 




/I 1 00 


1 0 1 0 ••• 0 




1 0 10 


: 1 0 1 ••• 0 




: 1 0 1 




— 

•) — 




0 ••• 0 1 0 1 




0 ••• 0 1 


\1 •••00 1 0^ 




l^0---00 



0 



(12) 



where TVi is the circulant matrix of the order 2, matrices T and S, 

r=F^(g)/+I(g)F„, S'=F^(g)J+J(g)C„, (13) 

define the systems of grids which are topologically equivalent to the torus and 
a sphere. These matrices are the adjacent matrices of the systems of grids, the 
degree of which are four. Therefore, setting 



£>4 = diag(4,4, ••• ,4), 



(14) 



the matrices Lt = T—D^ and £5 = S—D 4 are the discrete Laplacian operators 
with the circliar boundary condition f(x, y) = f{x + l,y) and f{x, y) = f{x, y + 
1), and the boundary condition /'(0,0, 1) = /'(O, 0, — 1) = 0, respectively [6,7]. 
A line graph, the adjacent matrix of which is expressed by Cm has a self-loop 
at each endpoint. It means this graph is not regular. However, this graph has 
similar properties as regular graphs. Therefore, it is possible to deal with a graph 
which is described by S' as a regular graph with the degree four. 



3.2 Eigenvectors of Deformations 

The eigenvalues and eigenvectors of matrix £Vi are 

A^ = 2cos2<, n™ = (l,a;^•••,a;(— 1)'=)^, (15) 

for a™ = kir/m and w™ = 1 such that w yf 1. This means that the eigenvec- 
tors of matrix Fm define the discrete Fourier transform [7,8]. Furthermore, the 
eigenvalues and eigenvectors of C„ are 

cTfe = 2 cos 2/3fc, = (cos/3^,cos3/3^,---,cos(2n- l)/3(])^, (16) 
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for = fc7r/2n. This means that the eigenvectors of C„ define the discrete 
cosine transform [7,8]. Moreover, for i = 0, 1, • • • , to — 1 and j = 0, 1, • • • , n — 1, 
we can adopt 

Xij = tt”, (g) v” (17) 

as the discrete spherical harmonics on the torus and sphere, respectively, since 
T and S describe topological properties of systems of grids on the toruses and 
spheres, respectively, and vectors Xij and are the eigenvectors of matrices T 
and S, respectively. 

The DFT matrix U and the DCT (Discrete Cosine Transform) matrix Vsuch 
that ^ ^ 

t7= V= -={vo,Vi,--- ,Vn-l) (18) 

yjn 

diagonalize Fm and C„, respectively. Therefore, these structures of the transfor- 
mation imply that properties of the discrete curvature flow are described using 
DFT and DCT if shapes are approximated using appropriate graphs, that is, 
if the grids for the computation of curvature flow are appropriately small, the 
linear approximation of the equations on polyhedrons are sufficient such that 
FFT (Fast Fourier Transform) and FCT (Fast Cosine Transform) achieve rapid 
numerical computation of the flow on polyhedrons. 

3.3 Stability and Convergence 

Using the properties of eigenvalues and eigenvectors of transformations, we ob- 
tain the following theorems for the stabilities and final shapes of polyhedrons, 
the connection of vertices of which are expressed by matrices T and S. 

Theorem 2 Two discrete systems 

X{t+l) = {I+aT{T-Di)}X{t), X{t+l) = {I+as{S-Di)}X{t), (19) 
are stable if ar and as hold the inequalities such that, 

0 < ax < -, 0<O5<-, (20) 

respectively. 

Theorem 3 The final shapes of polyhedrons whose topological connections of 
vertices are expressed using T and S are the centroids. 

Theorem 4 The asymptotic shapes of polyhedrons whose topological connec- 
tions of vertices are expressed using T and S are general ellipsoids ^ if t, the 
times of iterations, approaches to infinity. 

^ Setting {dEijjLi to be a collection of the boundaies of ellipses, we call 9E = dEi © 
dE 2 © ■ • • © &Ejk a general discrete ellipse, where A © B expresses the Minkowski 
addition of two point-sets A and B. Let A be a set of points in three-dimensional 
Euclidean space. If all slices of A which are perpendicular to (1,0,0)^, (0,1,0)^, 
and (0, 0, 1)^ are general ellipses, we call the set a general ellipse. 
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Since our definitions of the adjacent matrices are valid for nonsimple polyhe- 
drons, these theorems imply that discrete mean curvature flow does not preserve 
the topologies of objects in the final and asymptotic forms. Similar properties 
were proven for planar polygons by Bruckstein et al. [3] . Our results agree with 
their results, if our objects are planar polygons. Furthermore, our results agree 
with that the results of Bruckstein et al. if planar polygons are valid also for 
both simple and nonsimple spatial polygons. For the proofs of these theorems 
see reference [9]. 

4 Conclusions 

We derived a discrete approximation of curvature flow and showed that it has 
relations with graph theory. The graph theory description of the problem shows 
the mechanical aspects of the discrete curvature flow, that is, the graph theory 
treatment combines a theory of linear approximation of the curvature flow and 
numerical analysis. We also showed that a polyhedral approximation of a surface 
is described by a system of grids with the degree four, if we approximate a surface 
using two parameters. This approximation implies that properties of the discrete 
curvature flow are described using DFT (Discrete Fourier Transform) and DCT 
(Discrete Cosine Transform) since the eigenvectors of adjacent matrices of these 
grids define DFT and DCT matrices. 
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Abstract. We propose a method for contour figure approximation which 
does not assume the shape of primitives for contours. By smoothing out 
only local details by curvature flow process, a given contour figure is ap- 
proximated. The amount of smoothing is determined adaptively based 
on the sizes of the local details. To detect local details and to determine 
the amount of the local smoothing, the method uses the technique of 
the scale space analysis. Experimental results show that this approxima- 
tion method has preferable properties for contour figure recognition, e.g. 
only finite number of approximations are obtained from a given contour 
figure. 



1 Introduction 

Describing global shapes of a contour figure is essential for contour figure recog- 
nition. To describe global shape of a given contour is to approximate a contour 
figure. A contour can be approximated by replacing local details with simple 
shape primitives. For example, a line segment approximation replaces small 
shape details with line segments. Many approximation methods of a contour 
figure have been proposed, and almost of which assume the shape of the replac- 
ing primitives. As the result, those methods often fail to describe global shape of 
a given contour. For example, a line segment approximation fails to approximate 
a round circular contour. 

We propose an approximation method which do not assume primitive shapes. 
The method approximates a given contour by smoothing out only local details. 
To detect local details and to determine the amount of the local smoothing 
adaptively, we use a smoothing process which is known as curvature flow. The 
curvature flow is one of the smoothing processes of a contour which has a scale 
parameter. As the scale increases, the contour is more smoothed, and the curva- 
ture of the contour becomes constant. We define a shape component on a given 
contour with inflection points of the curvature. As the scale increases, shape 
components on the given contour disappear next by next. The size of each shape 
component is defined as the scale at which the shape component disappears. 
Shape components of small sizes correspond to local details and large ones cor- 
respond to global shapes [1]. 
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Our method, first, smoothes a given contour figure, and obtains a finger print 
pattern of curvature inflection points in the scale space. Then, referring to the 
finger print pattern, shape components smaller than a given scale are detected. 
This scale is called the base scale in this paper. Finally, the given contour figure is 
approximated by smoothing only smaller shape components. The amount of the 
smoothing is determined adaptively based on the size of each shape component. 
As shown in later, even if the base scale is changed continuously, this method 
produces only a finite number of approximations from a given contour. It is also 
shown that the approximations obtained from an approximated contour figure 
are included in the approximations obtained from the original contour figure. 

2 Smoothing of Contour Figure 

In order to detect shape components on a given contour, we use a smoothing 
process which is known as the curvature flow. Let us consider a contour repre- 
sented as C{u) = {x{u),y{u)) where u is a position parameter along the contour. 
From this contour C'(m), smoothed contours F{u, t) are obtained by solving next 
equations. 

jF’(u,0) = C(m), 

\dF{u,t)/dt= -kN, 

where n is the curvature, and N is the unit outward normal vector of F(u, t). The 
parameter t(> 0) is called the scale parameter. We assume that C{u) is a simple, 
closed, and piecewise plane curve. As t increases, the contour becomes more 
smoothed. This smoothing process has preferable properties for our purpose as 
followings [2] . 

— Increasing t generates no new curvature inflection point. 

— An inflection point disappears only when it meets with another inflection 
point. 

— Any contour converges to a round circle, then to a point, and disappears 
when t becomes A/2t: where A is the area of the given original contour. 

An example of a set of F{u,t) is shown in Fig.I(left). As t increases, the shape 
of the contour becomes close to a round circle. 

To obtain smoothed contours F{u, t), the level set method[3] or the Gaussian 
filtering[4] is available. We employ the latter method because, in the proposed 
method, every point on a given contour is traced through the smoothing process. 

3 Adaptive Local Smoothing for Approximation 

As described, any contour figure converges to a round circle in the smoothing 
process. This means that as the scale t increases, the curvature of the smoothed 
contour becomes more even and constant. When we plot u-n graph at every 
scale, the graph becomes even and flat as the scale increases. 
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Fig. 1. Left: An example of smoothed contonrs. The total length of the given contonr 
is 540, and f = 0, 100, 500, 900, and 1300 respectively from bottom to top. Right: An 
example of a finger print pattern of inflection points. 



We define a shape component of a contour figure with inflection points of the 
curvature. An inflection point of the curvature is defined as a point between the 
concave part and the convex part of the u-k graph of the contour figure. As the 
scale t increases, the number of the inflection points decreases, and the u-k graph 
becomes fiat. Increasing t, no new inflection point is generated, and inflection 
points disappear only when two or more inflection points meet together. We 
define a shape component as a part of a given contour between two inflection 
points which meet together when they disappear. We also define the size of 
the shape component as the scale at which the two inflection points meet and 
disappear. 

In general, an original contour figure has many shape components of various 
sizes. In the smoothing process, as the scale t increases, shape components dis- 
appear next by next, and finally, all shape components disappear to become a 
round circle. 

Our method approximates a given contour figure by smoothing out only 
small shape components. The amount of smoothing for each shape component 
is determined adaptively based on the size of the components. In order to detect 
shape components and their sizes of a given contour, we use the technique of a 
scale space analysis. 

When we plot the inflection points in the scale space, we obtain so-called 
finger print pattern. Figure I shows an example of the finger print pattern of 
infiection points. Here, the scale space is a space of the original position u and 
the scale t. Every curve of the finger print pattern closes upward as shown in 
Fig. I. In the scale space, each area closed by each curve of the finger print 
pattern corresponds to a shape component. Referring to the height of each area, 
we detect the size of each shape component. 
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(a) 





Fig. 2. The proposed approximation method. (a)The finger print pattern of inflection 
points, (b) Area B shows the area of the shape components whose scale is smaller than 
to. (c) At each scale, the area on the contour showed by Area B is smoothed. 



We approximate a given contour figure by smoothing out shape components 
whose sizes are smaller than a given scale to- We call this scale to as base scale. 
The amount of local smoothing is determined adaptively based on the size of 
each shape component. Because the shape components larger than to remain as 
it is on the original figure, the results may still contain small notches belong to 
larger shape components. The algorithm is as followings. 

1. Obtain the finger print pattern of a given contour on a scale space with an 
axis of original arc length u and the scale value t. See Fig. 2(a). 

2. Set the base scale to, and draw a line t = to on the scale space. 

3. Divide the scale space into areas A and B with the boundary of the inflection 
points pattern, where the area A includes the line t = to, and the area B 
does not. See Fig. 2(b). 

4. Smooth only shape components of the given contour which are smaller than 
to, that is, obtain a contour H{s,to) which satisfies (2). See Fig. 2(c). 

r H{s,0) = C{s), 

< dH{s,t)/dt = 0 ( if (s, t) G area A), (2) 

[ dH{s, t)/dt = kN ( if (s, t) G area B). 

It should be noted that, by changing the base scale to continuously, we have 
only a finite number of approximations by this process. This is because the shape 
of H{s,to) will change only when tp crosses over some closing point of inflection 
pattern. We will show some preferable properties of this approximation method 
with experimental results in the next section. 

4 Experimental Results of Approximations 

Figure 3(A) and (B) show approximations obtained from two contours of key 
silhouettes, respectively. As just described in previous section, by changing the 
base scale to, we obtain only finite number of approximated contours from a 
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Fig. 3. Results of proposed approximation method for key silhouette contours. In each 
box, the left one is the original contour. All approximated results obtained from the 
originals are shown in this figure. Contours in box(D) shows approximations obtained 
from the approximated contour shown in box(A). 



given contour. Fig. 3 shows all approximated contours obtained from the original 
ones. Two keys shown in Fig. 3 are different only at local shapes of their head 
parts. The approximations of the two keys with large base scales to have similar 
shapes because only head parts of them are smoothed adaptively. 

Figure 3(C) shows another experimental result. From a given snow-flake type 
contour, 6 approximations were obtained. Every approximation characterizes the 
global shape of a given contour at their respective level. A series of approxima- 
tions obtained from a given contour is a kind of hierarchical shape description 
of the contour. 

Figure 3(D) shows approximations of an approximated contour. The series 
of approximations of an approximated contour in (A) is entirely included in the 
series of approximations of the original contour figure. This property of inclusion 
is important for discriminating the similarities of the shape of contour figures. 

As shown in Fig. 3, in the series of approximations of a given contour, some 
parts change their shapes several times, but some parts do not. In order to 
construct a hierarchical description for each part of a given contour, we split the 
approximated contours at corner points. 

A corner point on a contour figure is a point at which the absolute value of 
the curvature is locally maximal. In the smoothing process, as the scale t in- 
creases, the number of corner points decreases, and no new corner point is gen- 
erated. In order to split the approximated contours appropriately, we split the 
approximated contours at corner points which do not disappear in the smooth- 
ing process until the largest shape component disappears. We call such corner 
points as dominant corner points [5]. By splitting the approximated contours at 
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Fig. 4. Hierarchical descriptions of key silhouettes 



dominant corner points, we construct a hierarchical description for each part of 
a given contour. 

The obtained hierarchical descriptions of three key contours are shown in 
Fig. 4. Each contour was split into five parts at dominant corner points, and 
hierarchical description of each part was constructed. This description shows 
clearly that these three contours have similar shapes from a global point of 
view, and that the shapes of hollow parts and the head parts are different. 

5 Conclusion 

In this paper, we propose a method of contour figure approximation, which 
smoothes out only small shape components and remains large shape clearly. The 
amount of smoothing for detail shape components are determined adaptively 
based on the scale space analysis. 

This method has following properties: First, only a finite number of approx- 
imated contours which characterize the hierarchical structure of the original 
contour are obtained. Next, a series of approximations of an approximated con- 
tour is entirely included in the series of approximations of the original contour. 
This property of inclusion is important for discriminating the similarities of the 
shape of contour figures. These properties promise a step of preferable recogni- 
tion method for contour figures. 
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Abstract. A one-dimensional deconvolution problem is discretized and 
certain multilevel preconditioned iterative methods are applied to solve 
the resulting linear system. The numerical results suggest that multilevel 
multiplicative preconditioners may have no advantage over two-level mul- 
tiplicative preconditioners. In fact, in the numerical experiments they 
perform worse than comparable two-level preconditioners. 



1 Introduction 



In image deblurring, the goal is to estimate the true image Utr 
blurred data ^(a;) = j k{x - y) Utmeiv) dy + r]{x). 



from noisy, 
( 1 ) 



Here rj represents noise in the recorded data, and the convolution kernel func- 
tion k, which is called the point spread function in this application, is known. 
Since deconvolution is unstable with respect to perturbations in the data, reg- 
ularization (i.e., stabilization which retains certain desired features of the true 
solution) must be applied. To a discretized version of the model equation (1), 
z = Kxitrue + Vt Eipply standard (zero order) Tikhonov regularization, i.e., 
we minimize r„(u) = ||ATu - z||^ -I- a||u|p, a > 0. (2) 

The a is the regularization parameter. The resulting minimizer Uq, solves the 
symmetric, positive definite (SPD) linear system 



Au = h, A = K*K + al, 



(3) 



with b = K*z. The superscript denotes matrix conjugate transpose. 

The system (3) is often quite large. For example, n = 256^ = 65,536 un- 
knowns arise from two-dimensional image data recorded on a 256 x 256 pixel 
array. For real-time imaging applications, it is necessary to solve these systems 
very quickly. Due to these size and time constraints, iterative methods are re- 
quired. Since the coefficient matrix A in (3) is SPD, the conjugate gradient (CG) 
method is appropriate. A tends to be highly ill-conditioned, so preconditioning 
is needed to increase the convergence rate. (It should be noted that CG can 
be applied to the unregularized system, and the iteration count becomes the 
regularization parameter. See for example [4] for details. We will not take this 
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approach here.) Convolution integral operators typically lead to matrices K with 
Toeplitz structure (see [2]). Circulant preconditioners [7,2,3] have proven to be 
highly effective for large Toeplitz systems. Standard multigrid methods have also 
been implemented (see for example [1]), but their utility seems limited by the in- 
ability to find a good “smoother”, i.e., an iterative scheme which quickly damps 
high frequency components of solutions on fine grids. 

It is well-known that wavelet multilevel decompositions tend to “sparsify” the 
matrices which arise in the discretization of integral operators. The task which 
remains is to efficiently solve the transformed system. Rieder [6] showed that 
fairly standard block iterative schemes (e.g., Jacobi and Gauss-Seidel iterations) 
are effective. (The correspondence between block index and grid level makes 
these methods “multilevel”.) Hanke and Vogel [5,8] extended Rieder’s results to 
two-level preconditioners. Their analysis showed that multiplicative (i.e., Gauss- 
Seidel-like) preconditioners are generally far superior to additive (Jacobi-like) 
preconditioners in terms of convergence properties. In particular, they obtained 
the bounds on the condition numbers, 

cond{C~^j^A) < bi a~^ as a — >■ 0, (4) 

cond(C~lifA) < b 2 a~^ as a 0, (5) 

where Cadd and Cmuit denote the additive and multiplicative preconditioning 
matrices, respectively. They also presented numerical results indicating that 
these bounds were sharp. 

In addition to rapid convergence, these two-level schemes offer other advan- 
tages. Toeplitz structure is not required for efficient implementation. Essentially 
all that is needed is a means of computing matrix-vector products Aw and a 
means of computing coarse-grid projections of vectors. See [8] for details. A dis- 
advantage in certain situations is the need to invert the “coarse-grid representa- 
tion” of the matrix A, which is the An in equation (8) below. The constants bi 
in (4)-(5) depend on how well the integral operator is represented on the coarse 
grid. If a is relatively small, then the bi’s also must be relatively small to main- 
tain rapid convergence. This typically means that An must be relatively large, 
and hence, expensive to invert. 

The cost of inverting relatively large coarse-grid representation matrices An 
in the two-level case motivated our interest in multilevel schemes. We conducted 
a preliminary numerical study which suggested that, at least with obvious im- 
plementations, multilevel schemes offer no advantage over two-level schemes. In 
the final section, we present the test problem used in this study. This is preceded 
by a brief sketch of multilevel iterative methods. 

2 Multilevel Decomposition and Iterative Schemes 

Let the columns of V comprise the discrete Haar wavelet basis for R", n = 2^, 
normalized so that V*V = I. Note that the discrete Haar wavelet vectors are 
orthogonal with respect to the usual Euclidean inner product, so orthonormal- 
ity can be achieved simply by rescaling these vectors. The system (3) can be 
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transformed to 

Ax = b, (6) 

where 

i = V*AV, u=Vx, b = V*h. (7) 

Partition the wavelet transformed matrix A into blocks Aij, 1 < i, j < m, 





All 
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■ ■ ^Im 




A21 


A22 ■ 


■ ■ A2m 


A = 


-^ml 


Am 2 • 


A 

• ’ -^mm 



Multilevel iterative methods can be derived from a natural splitting of these 
blocks, i.e., 

A = L + D + U, (9) 

where U consists of the upper triangular blocks Aij,j > i, D consists of the 
diagonal blocks An, and L = U*. For instance, to derive a multilevel additive 
Jacobi iteration to solve (3), take an initial guess u°, set x° = P*u°, iterate 

x^+i = D"i(b-(F+C/)x^), i/ = 0,l,..., 

and then back-transform via (7) to obtain an approximate solution to the original 
system (3). To derive the additive Schwarz iteration presented by Rieder in [6], 
replace the Amm block in the block diagonal matrix D by al, where I is the 
identity matrix of the appropriate size. 

Similarly, one can derive multilevel Jacobi and additive Schwarz precondi- 
tioners. To apply such a Jacobi preconditioner to a vector r G i?", one first 
applies the Haar wavelet transform to this vector, obtaining f = V*r. One then 
computes x = D~^r, and then back-transforms via (7) to get 

u=VD~^V*r = Cj^r. 

The matrix Cj = VDV* is the multilevel Jacobi preconditioning matrix. To 
derive a multilevel additive Schwarz preconditioner, one again replaces the Amm 
block in D by al. 

These Jacobi/additive Schwarz iterative methods neglect off-diagonal terms 
in the block decomposition (8). Incorporating these off-diagonal terms leads 
to multiplicative iterative methods. Perhaps the simplest example is multilevel 
Gauss-Seidel iteration, which can be expressed as = Vx'', where x'^ is ob- 
tained from 

x''+^ = {L + D)-\h-Ux’'). (10) 

A multilevel multiplicative Schwarz iteration is obtained by again replacing the 
Amm block of D by al. 

To obtain symmetric Gauss-Seidel/multiplicative Schwarz iterations, follow 
(10) by 
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yi^+‘^ = {D + U)-^{1- Lyi''+^). ( 11 ) 

To obtain the action of a multilevel symmetric Gauss-Seidel preconditioner on a 
vector r, replace b in (lO)-(ll) by f = V*v, set = 0, and then back-transform 
via (7) to obtain 

Csfigr = V{D + U)-^{V*Y - L{L + D)~W*r) 

= V{D + U)-^D{L + D)-^V*r. (12) 

Consequently, the symmetric Gauss-Seidel preconditioning matrix is 

Csgs = V{L + D)D^\D + U)V*. (13) 

Once again replacing the Amm block in D by al yields a corresponding multilevel 
symmetric multiplicative Schwarz preconditioner (SMS) denoted Csms- 



3 Numerical Results 



A symmetric Toeplitz matrix K = h x toeplitz(k) was generated from a dis- 
cretization k = (fc(xi), . . . , k{xn)) of the Gaussian kernel function 



k{x) 



exp(— x^/cr^) 



0 < X < 1, 



(14) 



Here h = 1/n and Xi = {i — l)h, i = 1, . . . ,n. We selected the kernel width 
parameter a = 0.05 and the number of grid points n = 2^ = 128. The n x n 
matrix K is extremely ill-conditioned, having eigenvalues which decay to zero 
like exp(— (T^j^) for large j. The matrix A = K*K + al was computed with 
regularization parameter a = 10“^. The distribution of the eigenvalues of A is 
shown in the upper left subplot of Fig. 1. From this distribution, it can be seen 
that the eigencomponents corresponding to roughly the smallest 110 eigenvalues 
of K have been filtered out by the regularization. The value of the regularization 
parameter is nearly optimal for error-contaminated data whose signal-to-noise 
ratio is 100. 

We computed several two- and three-level SMS preconditioners for the sys- 
tem (3). The notation Csms{t) denotes the two-level (m = 2 in equation (8)) 
SMS preconditioner with the “coarse grid” block An of size r x r. To obtain 
CsMs{f) the matrix A is transformed and partitioned into 2x2 blocks, cf. (8). 
The {n — r) X {n — r) submatrix A22 is replaced by aln-r, the splitting (9) 
is applied, and the right-hand-side of (13) is computed. The eigenvalues of the 
matrix products CsMsii")~^ A were computed for coarse grid block sizes r = 16 
and r = 32. The distributions of these eigenvalues are displayed in the upper 
right and lower left subplots of Fig. 1. The reduced relative spread and clus- 
tering of these eigenvalues ensures rapid GG convergence. Recall the eigenvalue 
relative spread can be quantified by the condition number, which is the ratio 
of the largest to the smallest eigenvalue. With course grid block size r = 16, 
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the condition number of CsMs{f)~^A is 48.419, while for block size r = 32, the 
corresponding condition number is 1.4244. As one might expect, doubling the 
size of the coarse grid block An substantially decreased the condition number. 
In contrast, the condition number of the matrix A (without preconditioning) is 
nearly 1000. 






Eigenvalues of C (16,32) A 




Fig. 1. Eigenvalue distributions for various multilevel symmetric multiplicative Schwarz 
preconditioned systems. 



Let CsMsif, s) denote the three- level multiplicative Schwarz preconditioner 
whose coarse grid block An in (8) has size rxr and whose second diagonal block 
A 22 has size (s — r) x (s — r). The third diagonal block A33 is replaced by 
With r = 16 and s = 32, the distribution of the eigenvalues of Csms{t, A is 
shown in the lower right subplot of Fig. 1. The corresponding condition number is 
200.85. This is substantially worse than the result for the two-level preconditioner 
with coarse grid block size 32 x 32. What is surprising is that this is worse 
than the result for the two- level preconditioner with coarse grid block size 16 x 
16. This comes in spite of the fact that much more work is required to apply 
C'sms(16,32)-i than to apply C'sms(16)"^ 

The condition numbers for the various matrix products C~^A in the ex- 
ample presented above are summarized in column 2 (Test Case 1) of the ta- 
ble below. The I in column 1 indicates that no preconditioning is applied, i.e., 
the condition number of A appears across the corresponding row. In column 3 
(Test Case 2), results are presented for the same kernel function, cf., equation 
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(14), but the kernel width a = .1 is increased, and the regularization param- 
eter a = 10“'^ is decreased. As in Case 1, the two-level preconditioners out- 
perform comparable three-level preconditioners. The difference in performance 
is even more pronounced with the broader kernel than with the narrower ker- 
nel. In column 4 (Test Case 3), we present results for the sine squared kernel, 
k{x) = (sin^TTx / a) / { ttx / a))'^ , with kernel width a = .2, and regularization pa- 
rameter a = 10“®. The results are comparable to those obtained in Case 2. 



Preconditioner 

C 


Test Case 1 
Condition No. of 


Test Case 2 
Cond. No. C~^A 


Test Case 3 
Cond. No. C~^A 


I 


824 


8777 


3175 


Csms(16) 


48 


2.2 


1.5 


Csms(32) 


1.4 


1.1 


1.1 


Csms(16,32) 


201 


254 


86 



Conclusions. For all three test problems, the 3-level SMS preconditioners yielded 
larger condition numbers and less eigenvalue clustering than comparable 2-level 
SMS preconditioners. While these test cases may be unrealistically simple, they 
suggest that no advantage is to be gained by implementing multilevel precondi- 
tioners for more realistic (and complicated) problems. 
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Abstract. Using the registration of remote imagery as an example do- 
main, this work describes an efficient approach to the structural matching 
of multi-resolution representations where the scale difference, rotation 
and translation are unknown. The matching process is posed within an 
optimisation framework in which the parameter space is the probabil- 
ity hyperspace of all possible matches. In this application, searching for 
corresponding features at all scales generates a parameter space of enor- 
mous dimensions - typically 1-10 million. In this work we use feature’s 
hierarchical relationships to decompose the parameter space into a series 
of smaller subspaces over which optimisation is computationally feasible. 



Key Words: Multi-Scale Matching, Structural Matching, Optimisation 

1 Introduction 

Extracting extended image features and their relationships from images will 
enable the application of structural matching techniques to image-to-image re- 
mote sensing registration problems [7]. A multi-resolution contour representation 
of the coastline is constructed in the next section for two reasons. First in dif- 
ferent modalities, coastlines may be captured at different scales [5, 6]. Second, 
match results at higher levels within the hierarchy can be propagated down to 
lower levels [3]. While we restrict ourselves to coastlines, the problem generalises 
to any non-iconic multi-scale structural representation where both the number 
of candidate matches is enormous and the proportion of correct matches is very 
small. Ensuring the global convexity of our match functional or recovering an 
initial probability estimate close to the global optimum is practically impossi- 
ble. In section 3 we formulate the registration as an optimisation problem, and 
introduce the decomposition principle which underpins our approach. The ex- 
istance of hierarchical relationships between features at different scales enable 
the optimisation functional defined over an enormous parameter space to be de- 
composed into a recursive series of functionals defined over smaller subspaces 
over which optimisation is computationally feasible. A Genetic Algorithm[l,A,2] 
is employed to both facilitate escape from local optima, and to generate multiple 
good hypotheses for the the recursive optimisation procedure. 
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2 Multi-Resolution Segmentation of Coastline Contours 

As images of a coastline may be captured by different types of sensor at arbitrary 
heights, matching may have to be performed between a pair of images taken at 
different resolutions. A pyramidal multi-resolution representation of each image 
may be generated by repeated subsampling. By choosing a small level of subsam- 
pling between levels in our image pyramid, we may ensure that the two images 
are similar at some scale difference (see figure 1). Moreover, additional levels of 
match constraint are available by demanding that the correct solution generates 
a consistent set of corresepondences at all levels of the hierarchy. 





Fig. 1. (a) Matching Multi-Resolution Representations (b) Segmented Codons 



Extraction of the coastline contours in each image is achieved by first binarising 
satellite images into land and sea regions, and then extracting edge chains using a 
region boundary extraction algorithm. These contour chains are then segmented 
into codons - significant contour segments [5]. Natural points at which to segment 
the contours are curvature extrema and curvature zero-crossings. Examples of 
extracted codons are shown in figure 1(b). 

The multi-resolution segmentation technique outlined above produces a series 
of hierarchically related image features. At the lowest level in this hierarchy 
(highest resolution), are the original set of codons Aq and l?o generated from each 
image respectively. At the next level, the feature sets A\ and are generated by 
subsampling the image of the previous layer. This is repeated until the feature 
sets Al -1 and fih-i of the topmost layer are recovered. Where the scale rises 
by a factor of \/2 through the hierarchy, the number of features in each layer 
reduces approximately by a factor of ^/2. 

The hierarchical relationships between features from adjoining layers in the 
multi-resolution pyramid may be captured by the sets T-L\\ \ & Ai^l < I < L 
and TLIj] uj G S^i,l < I < L. Each set contains the set of features from the 
higher resolution 1 — 1 layer which are contained in the lower resolution feature A. 
(Since features in sets Aq and are at the bottom-most level of the hierarchy, 
hierarchical sets cannot be computed.) 
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3 Posing Registration as an Optimisation Problem 

A match 7 is constructed using one feature from each image such that 7 = {A, w} 
where features A and uj are drawn from feature sets A and 17 respectively. Let 
p('j) be the probability of the match 7. The matching process may now be defined 
as a procedure for computing the set of match probabilities. The requirement to 
enable matching between two contours of potentially different scales necessitates 
allowing matches between any feature in one image with features in all scales 
in the second image. The set of candidate matches T from which 7 is drawn is 
therefore defined as 

r = Ax n (1) 

Thus the size of the candidate match set is given by the outer product of the 
full hierarchy of features. For typical satellite imagery enjoying image sizes of 
2000 X 2000, at least 1000 codons may be generated at the highest resolution of 
the feature hierarchy, resulting in potentially lOM matches. 

Irrespective of whether it plays an inhibiting or supportive role, each match 
can be considered as a source of contextual information about other matches 
and may therefore aid their interpretation. Structural match constraint may be 
defined as the degree to which pairs of matches are mutually compatible. Sources 
of such constraint are usually derived from world knowledge such as uniqueness, 
continuity, topology and hierarchy[3]. The degree of compatibility between any 
pair of matches 7 and 7' is captured by the expression 

-1<C(7,7')<1 (2) 

Pairs of correct matches should ideally enjoy a strong level of mutual com- 
patibility while pairs containing a false match should generate low levels of 
compatibility. This suggests the following suitable optimisation functional. If 
P = (p(7i)j • ■ • jP{im)) is the vector containing the probabilities of all M matches 
in T, then a suitable functional A(p) which measures the degree of mutual com- 
patibility for a mapping p may be defined as 

Fr{p) = C'(7,7')P(7)P(7) (3) 

7'^7 

which may be maximised by eliminating matches {i.e. p{'y) — >■ 0) which increase 
the degree of incompatibility, ^^(p) describes a functional defined over a M- 
dimensional space V = p(7i) x ^(72) x • • • x p{yM) where the correct mapping 
is represented by that probability vector p which maximises equation 3 i.e. 

p = argmax Fp(p) 

pG"P 

This optimisation functional may be rewritten in vector matrix form 

O’(p) = ^pQp^ 

where the matrix Q stores the symmetric compatibility terms. 



( 5 ) 
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4 Hierarchical Subspace Decomposition 

Where |yl x f}\ is very large, direct optimisation of equation 5 is impractical. 
However, for the hierarchically organised features of this application, it is possible 
to partition the probability space V into a series of smaller subspaces over which 
the optimisation process may be performed independently. 

Let V represent a subspace of the full probability space V. Let us assume 
that the position of the global maximum p of a functional F-p defined over V 
contains the position of the global maximum p' of a functional Fpi defined over 
the smaller subspace V' . This is true if the matches whose probabilities are 
contained in V' are hierarchical parents of the matches whose probabilities are 
contained in V. In this case, the maximisation may be decomposed into two 
independent problems each over a smaller probability space i.e. first maximise 
Fpi before maximising the full functional Fp. 

In fact we can partition V into an ordered series of N smaller subspaces 
Vo,Vi,- ■ -Vn-i such that the position of the maximum p„ of each functional 
Fp^ defined over Vn is contained within the position of maximum p of the func- 
tional Fp defined over the full probability space V. Thus the global maximum 
of the functional Fp is defined as the concatenation of each of the local maxima 

p = argmax F-p(p) = (po, pi, • • • , pw-i) (6) 

pev 



Each of these local maxima (global maxima in their respective subspaces) is 
defined as before i.e. that vector p„ G Vn which maximises the functional Ep„ 

Pn = argmax Fp^ (p„) (7) 

where Fp^ is defined in recursive form as 

PVniPn) = -f P„.h„ (p„_i) 

h„ = iL„(po, . . . ,p„-i)’^ (8) 

FvoiPo) = ^PoQoP(T 

The minimisation of Fp^ depends on the position of earlier minima PO) ’ ’ ' Pn-i 
implying that the functionals must be maximised in a particular order. Conse- 
quently, equation 7 specifies an estimator which converges on the global maxi- 
mum p over progressively larger proportions of the full parameter space V. 

Features from higher up the feature hierarchy tend to capture the larger scale 
structure in the image. Feature matches are ordered hierarchically allowing so- 
lutions at higher levels to guide the match process further down the hierarchy. 
Our hierarchical propagation strategy partitions this full ordered match prob- 
ability space into a series of smaller subspaces. A genetic algorithm is used to 
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recover a number of good yet disparate solutions in each partition. The union of 
these solution is used in conjunction with the hierarchical relations T-L^] 7 G T 
generated in section 2 to generate the next partition. Note also that as the pro- 
cess progresses, the descendants of all matches not selected are pruned from the 
search space. This has the effect of dramatically reducing the dimensionality of 
the problem! 

Generating the Initial Subspace Partition Depending on the scale differ- 
ence between the two images, the correct mapping will map EITHER the highest 
most feature set A^_i onto one of the feature sets Hq, . . . (2l-i OR the highest 
most set onto one of the feature sets Hq) ■ • ■ Thus the first partition 

Fq should be restricted to locating the first correct mapping among this most 
salient feature match set i.e. 

Fo = [J X f2i} U {Ai X I7i_i}} (9) 

Any limits on the expected scale difference between the pair of images will sig- 
nificantly reduce the size of Iq- There are typically 20-30 codon features at the 
highest level of our hierarchy while the expected scale difference is no greater 
two or three octave generating an initial partition |/b| < 6000. While still a very 
large space, this is considerably smaller than F whose size can be several million. 

Since we are not employing the full match hierarchical match constraint 
available, there is an increased likelihood that the best solution will not coincide 
with the global solution. Consequently, we recover the best Aq solutions from 
which the set AAq of matches for this most salient partition are recovered. 

Propagating Match Information Having found the match solution Ain-i 
from a previous partition Fn-i, the hierarchical relations % may be used to 
dramatically prune the set of as yet unprocessed matches F — {r„_i U • • • U /q}- 
The next partition need only contain matches whose parent features belong 
to a match in the previous match pool Mn-i- Thus if 7 represents a match 
between two features A and to, then 

Fn= U -Hxxn^ (10) 

'y&Mn-i 

On average each hierarchical set "H has ^/2 features. Consequently the size of 
the next partition is given by |T„| « 2|A4„_i|. Unlike the first partition, this is 
typically a few hundred matches which enables rapid optimisation. 

The sets of multiple solutions from these repeated optimisations are ordered 
to recover the best N„ solutions which in turn are propagated to the next par- 
tition. The above procedure is merely repeated until all of the matches within 
F have been included within a partition and optimised. The solution of the first 
partition effectively recovers the scale difference AS of the mapping between the 
image pair. Thus the number of subsequent partitions is L — AS — 1 where L is 
the multi-resolution number of levels in the codon pyramid. 
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5 Conclusions 

A graph matching framework for registering images has been proposed which 
enjoys a number of advantages over traditional techniques relying on similarity 
of tokens. First there is no requirement for both images to be captured at the 
same scale or resolution. Second, as only local topological constraint is used, no 
global transformation between the images is assumed. This is particularly useful 
where the images may contain severe local geometric distortions. The graph 
matching problem has been formulated within an optimisation framework. This 
not only provides a principled manner of combining complex match constraint, 
but enables us to explore a number of different optimisation techniques already 
reported in the optimisation and computer vision literature. 

The primary difficulties of the problem is the extremely high dimensionality 
of the optimisation space, the very low levels of correct match density, and the 
non-convexity of the match functional. Two strategies have been employed to 
ameliorate this problem. First, exploiting the multi-scale representation already 
built for each image, a multi-scale matching (or hierarchical propagation) strat- 
egy delivers a considerable increase in speed by partitioning the match problem 
into a number of decomposed steps. Matching is first performed at higher levels 
in the hierarchy and the recovered mappings are then propagated down to lower 
levels. To exploit this hierarchical decomposition, the optimisation functional 
itself required decomposing to enable match probabilities computed for matches 
higher up the hierarchy to contribute to the optimsation process. Second, as 
direct descent optimisation tecchniques will not perform well where the match 
density is low or the functional is highly non-convex, a search strategy based on 
the genetic algorithm ensures a global optimum is recovered. 
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Abstract. We introduce a 3D tracing method based on differential ge- 
ometry in Gaussian blurred images. The line point detection part of the 
tracing method starts with calculation of the line direction from the 
eigenvectors of the Hessian matrix. The sub-voxel center line position is 
estimated from a second order Taylor approximation of the 2D intensity 
profile perpendicular to the line. The line diameter is obtained at a single 
scale using the theoretical scale dependencies of the 0-th and 2nd order 
Gaussian derivatives at the line center. Experiments on synthetic images 
reveal that the localization of the centerline is mainly affected by line 
curvature. The diameter measurement is accurate for diameters as low 
as 4 voxels. 



1 Introduction 

Quantitative analysis of curvilinear structures in images is of interest in various 
research fields. In medicine and biology researchers need estimates of length and 
diameter of line- like structures like chromosomes [11] blood vessels or neuron 
dendrites [1] for diagnostic or scientific purposes. In the technical sciences there 
is an interest in center line positions of line structures in engineering drawings 
[2] or automatic detection of roads in aerial images [3]. 

Any method for detection of the centerline of curvilinear structures needs a 
criterion for a certain position in the image to be part of a center line. Methods 
differ in the definition of such a criterion and in the way the criterion is evaluated. 

In a first class of methods a line point is defined as a local grey value max- 
imum relative to neighboring pixels or voxels [4] [5]. Since no reference is made 
to properties of line structures in an image this class will generate many false 
hypotheses of line points if noise is present. 

A better criterion for a line point is to consider it to be part of a structure 
which length is larger than its diameter [6]. This criterion can be materialized 
within the framework of differential geometry [3], [6], [7], [8]. 

There are several computational approaches to computing the differential 
structure in an image. In the facet model of Haralick [8] image derivatives in 
a 2D image are calculated from a third order polynomial fit to the image data 
in a 5x5 neighborhood. Along the line perpendicular to the direction of the line 
structure the sub pixel position where the first derivative vanishes is estimated 
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from the polynomial. A pixel is declared a line point if this position is within 
the pixel boundaries. The main drawback of this method is that the differential 
structure of the image is calculated at the fixed inner scale of the image which 
may lead to erroneous line center positions in case of noise or bar shaped intensity 
profiles. 

An essentially different computational approach is to calculate image deriva- 
tives at a scale adapted to the line diameter by convolution of the image with 
Gaussian derivative kernels [9] . The properties of the Gaussian kernel reduce the 
influence of noise and ensures meaningful first and second order derivatives even 
in the case of plateau-like intensity profiles across the line [3] [10]. 

Few line detection or tracing methods provide an estimate of the line width [6] 
[11]. In [11] the diameter is estimated from a fit of a function describing the line 
profile. This method suffers from the same noise sensitivity as the facet model 
[8]. This problem can be avoided by using the scale dependency of normalized 
second derivatives to estimate line diameter [6] . However, in [6] no evaluation of 
the method is presented and the diameter is found by iteration over scale which 
is computational expensive. 

In this paper we present a 3D line tracer which uses the line point detection 
method as presented in [10] [3] and measures diameter at a single scale based on 
the theorectical scale dependency of Gaussian derivatives in the image. 

2 Tracing of 3D curvilinear structures 

Our tracing procedure starts by selecting a discrete point Pd at position (x, y, z) 
in the image close to a center line position. At this position we calculate the 
Gaussian derivatives up to order two. The second order derivatives are used to 
build up the Hessian matrix H from which we calculate the 3 eigenvalues At, A„, 
Xjn and the corresponding eigenvectors t, n and m. The eigenvectors form an 
orthonormal base for a local Gartesian coordinate system through the center of 
the voxel. The vector t which is aligned to the line direction is the eigenvector 
with the smallest eigenvalue in magnitude At [6]. 

Locally around point Pd the grey value distribution in the plane perpendicu- 
lar to the line direction is approximated by the second order Taylor polynomial 

i{^,v)^i + p-yi+\p^ -n-p (1) 

where I and V/ are the Gaussian blurred grey value and gradient vector at the 
current discrete voxel position Pd ■ In ( 1 ) p is a vector in the plane perpendicular 
to the line direction defined by n and m i.e. p = ^n+ ym. 

The center line position Pc relative to the voxel center is found by setting 
the first derivatives of the local Taylor polynomial along ^ and rj to zero [10] 
and solving 77 and ^ from the resulting linear equation. The sub voxel center line 
position Ps is calculated by Ps — Pd + Pc ■ If Ps is not within the boundaries 
of the current voxel the line point estimation procedure is carried out again at 
the discrete voxel position closest to Ps ■ This procedure is repeated until Ps 
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is within the boundaries of the current voxel. The tracing proceeds by taking a 
step from the estimated position Pg in the t-direction. 



3 Diameter estimation 



For diameter estimation it is necessary to take the shape of the 2D grey value 
profile perpendicular to the line into account [6]. This grey value profile /(r) is 
assumed to obey the following conditions: 



I{r) 



Iof{r), (r < R) 
0, (r > R). 



(2) 



In (2), /q is the grey value at the center line, r = is the distance from 

the center line position and R the radius of the line structure. The first derivative 
of /(r) is assumed to vanish at the centerline. 

We use the scale dependencies of I{r) convolved with a Gaussian and the 
second Gaussian derivatives of I(r) at r = 0 to estimate the line diameter. For 
this purpose expressions are derived for the Gaussian blurred intensity / {R, a) 
and the Laplacean A^I{R,a) restricted to the span of n and m: 



p2iz pR 

I{R,a)=Io / f{r)g{r,a)rdrde (3) 

Jo Jo 

p2tt pR 

A^I{R,a)=Io / f{r)grr{r,o)rdrde . (4) 

Jo Jo 

In (3) and (4) g{r,a) and grr{r,(j) are the 2D Gaussian and its second deriva- 
tive in r-direction. The expressions for I(R,a) and the normalized Laplacean 
cr^^A^I{R,a) are used to construct a non-linear filter which is rotation invari- 
ant with respect to the line direction and independent of Iq: 



h{R, a) 



—a'^\A^I{R, a) 



(5) 



The filter output h{R,a) is dependent on the choice of /(r). For a parabolic 
and a pillbox profile the integrals appearing in eqs. (3) and (4) can be evaluated 
analytically and h{R, a) turns out to be only dependent on the dimensionless 
parameter <? = | (R/a) . Figure 1 hows that h{R,a) is a monotonically increas- 
ing function of q which makes it easy to estimate q from a measured filter output 
hm- Provided that hm is measured and a priory knowledge concerning the shape 
of the profile is available qo can be estimated by solving h{qo) — /im = 0. The 
corresponding R is found by applying R = u^/2q^. 



4 Experiments 

The localization accuracy and robustness of the tracing and the diameter esti- 
mation methods was evaluated using a set of synthetic test images which reflects 
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Fig. 1. Line diameter filter output h{q) for a pillbox profile (solid line) and a parabolic 
profile (dashed line), (q = | (R/a)^ ). 



possible properties of curvilinear structures. Properties which are considered are 
the shape of the grey value profile perpendicular to the line, the curvature of the 
center line of the structure and the noise level. 

4.1 Bias in center line position 

Images of straight rods with pillbox profiles and parabolic profiles were created 
with R ranging between 2 and 15. In case the line center was set to the central 
position in the voxel the estimated center line position turned out to be bias 
free. A sub voxel shift ds in the plane perpendicular to the line leads to an 
experimentally observed bias APg which increases with ds but never exceeds 
0.08 (i? = 2, cr = 2). Experiments with larger R and larger kernel size a show a 
smaller bias. 

To investigate bias introduced in center line position due to line curvature we 
estimated center line positions in a torus with a parabolic line profile. An analysis 
of the mathematical expressions used to calculate the line center revealed that 
the relative bias APg/R in the center line position depends only on Rt/R and 
R/a. The experiments show that APg/R decreases with Rt/R and R/a (Fig. 2.). 

4.2 Bias in diameter estimate 

To test the performance of the line diameter estimation method images contain- 
ing straight line segments with circular cross section were used. The diameter 
estimate turned out to be independent of the setting of cr in the range where 
0.2 < R/a < 2. In images without noise the bias in the estimated diameter is 
always below 5% (Fig. 3.). In an additional experiment Gaussian noise in the 
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R,/R 



Fig. 2. Relative bias in center line position (APs/R) as function of Rt/R for settings 
of R/a of 0.62 (triangles) 1.0 (squares) and 1.67 (dots). 



range between 0 and 30% was added to the image with a pillbox shaped inten- 
sity profile and i? = 20. The measurements show a bias of 5% in the diameter 
estimation at a noise level of 10%. This bias increases to a level of 30% at a noise 
level of 30%. 



5 Discussion 

In our tracing method the Gaussian derivatives are calculated only at a limited 
amount of points at one single scale in the neighborhood of the center line 
position. Consequently, the method provides for sufficiently small response times 
to allow interactive measurement of both center line location and diamerter. 

One of the criteria for a point to be on the centerline of the curvilinear 
structure (V/ • p = 0) implies that the localization of the line center is only bias 
free for a straight line with the centerline positioned in the center of the voxel. 
A sub voxel shift of the center line introduces a small bias in the center line 
position. 

High curvature is a serious source of bias in the center line location. This can 
be understood by realizing that at high curvature the position where 'S/I ■ p — 0 
will be significantly shifted due to the spatial extend of the derivative kernels. 

The diameter measurement based on the scale dependency of the 0-th and 
the second Gaussian derivatives performs well in the noiseless situation even for 
R as small as 2 times the voxel size. In the diameter estimation procedure noise 
added to the image introduces a bias in the line diameter estimate. 
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Fig. 3. Relative bias in estimation of radius R of the line structure as a function of R 
for a parabolic profile (squares) and a pillbox profile (dots) 
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Abstract. We develop on estimation method, for the derivative field of 
an image based on Bayesian approaeh which is formulated in a geometric 
way. The Maximum probability configuration of the derivative field is 
found by a gradient descent method which leads to a non-linear diffusion 
type equation with added constraints. The derivatives are assumed to be 
piecewise smoothe and the Beltrami framework is used in the development 
of an adaptive smoothing process. 



1 Introduction 

It is widely accepted that gradients are of utmost importance in early vision anal- 
ysis such as image enhancement and edge detection. Several numerical recipes 
are known for derivatives estimation. All based on fixed square or rectangular 
neighborhoods of different sizes. This type of estimation does not account for the 
structure of images and bound to produce errors especially near edges where the 
estimate on one side of the edge may wrongly influence the estimate on the other 
side of it. In places where the image is relatively smooth, least square estimates 
of derivatives computed over large area neighborhoods will give best results (e.g 
the facet approach [2], see also [1]). But, in places where the underlying image 
intensity surface is not smooth, and therefore can not be fitted by a small degree 
bivariate polynomial, the neighborhood should be smaller and rectangular, with 
the long axis of the rectangle aligned along the orientation of the directional 
derivative. 

From this viewpoint, it is natural to suggest a varying size and shape neigh- 
borhood in order to increase both the robustness of the estimate to noise, and its 
correctness. Calculating directly for each point of the image its optimal neighbor- 
hood for gradient estimation is possible but cumbersome. We Therefore propose 
an alternative approach, which uses a geometry driven diffusion [8] that produces 
implicitly, and in a sub-pixel accuracy, the desirable effect. We are not concerned, 
in this approach, with finding an optimal derivative filter but formulate directly 
a Bayesian reasoning for the derivative functions themselves. 
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The paper is organized as follows: In Section 2 we review the Beltrami frame- 
work. A Bayesian formulation of the problem, in its linear form, is presented in 
Section 3. We incorporate, in Section 4, The Beltrami framework in the Bayesian 
paradigm, and derive partial differential equations (PDEs) by means of the gra- 
dient descent method. Preliminary results are presented in Section 5. 



2 A Geometric Measure on Embedded Maps 



We represent an image as a two-dimensional Riemannian surface embedded in a 
higher dimensional spatial-feature Riemannian manifold [11,10,3,4,5,13,12]. Let 
11 = 1,2, be the local coordinates on the image surface and let A*, i = 
1, 2, . . . , m, be the coordinates on the embedding space than the embedding map 
is given by 

. . . , X"^{a\a^)). ( 1 ) 

Riemannian manifolds are manifolds endowed with a bi-linear positive-definite 
symmetric tensor which is called a metric. Denote by (27, the image man- 

ifold and its metric and by (M, (hij)) the space-feature manifold and its corre- 
sponding metric. Then the map X : X ^ M has the following weight [7] 

E[X\g^,,h,,] = j cfa^g^^'^{d^X%d^X^)h,j{X), ( 2 ) 



where the range of indices is fr,v = 1, 2, and i, j = 1, . . . ,m = dimM, and we 
use the Einstein summation convention: identical indices that appear one up and 
one down are summed over. We denote by g the determinant of (g^u) and by 
{g^'') its inverse. In the above expression cPa^/g is an area element of the image 
manifold. The rest, i.e. g^^ {d^X^){d^X^)hij{X), is a generalization of L 2 . It is 
important to note that this expression (as well as the area element) does not 
depend on the local coordinates one chooses. 

The feature evolves in a geometric way via the gradient descent equations 



A 



I — 

t — 



dX^ 



2^ 5X1 ■ 



(3) 



Note that we used our freedom to multiply the Euler-Lagrange equations by 
a strictly positive function and a positive definite matrix. This factor is the 
simplest one that does not change the minimization solution while giving a 
reparameterization invariant expression. This choice guarantees that the flow is 
geometric and does not depend on the parameterization. 

Given that the embedding space is Euclidean, The variational derivative of 
E with respect to the coordinate functions is given by 

- ( 4 ) 



where the operator that is acting on A* in the first term is the natural gener- 
alization of the Laplacian from flat spaces to manifolds and is called the second 
order differential parameter of Beltrami [6], or in short Beltrami operator. 
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3 Bayesian formulation for derivatives estimate 



Denote by (x^, ys) the sampling points and by 1 °^ = I^{xr, Vs) the grey- levels at 
the sampling points. 

From the data i.e. {xr, r/s), we want to infer the underlying function I{x, y) 
and its gradient vector field V (x, y). The analysis is easier in the continuum and 
we refer from now on to as to a continuous function. In practice we can skip a 
stage and find the derivatives without referring to the underlying function. The 
inference is described by the posterior probability distribution 



P{l{x,y),V{x,y)\I°{x,y)) 



P{l\x, y)\l{x, y),V{x, y))P{l{x, y),Y{x, y)) 
P{P{x,y)) 



In the numerator the first term P{P{x,y))\V{x,y)) is the probability of the 
sampled grey- level values given the vector field V(x,y) and the second term is 
the prior distribution on vector fields assumed by our model. The denominator 
is independent of V and will be ignored from now on. 

Assuming that P{A\B) is given by a Gibbsian form : 

P{A\B) = 



we get 



- log P(V(x, y)\P{x, y)) = aE^{l\x, j/), V(x, y)) + /3A2(V(x, y)). 

If we use the Euclidean L2 norm we get 

Ei(/ 0 (x,j/),V(x, 2 /)) = J dxdy{\V-VI\^) 

E2(V(x, 2/)) = ^C2 J dxdy (I VV| 2 ) + E3, ( 5 ) 

where the first term is a fidelity term that forces the vector field V to be close 
enough to the gradient vector field of I{x,y). The second term intoduces reg- 
ularization that guarantees certain smoothness properties of the solution. The 
second term in E2 constraints the vector field to be a gradient of a function. Its 
form is: 

E3(/(x,2/),V(x,2/) = ^-C 3 J dxdy{e^^’'d^V,f = J dxdy{Vly - V2^f, 

where is the antisymmetric tensor. 

Alternatively we may adopt a more sophisticated regularization based on 
geometric ideas. These are treated in the next section. 

Maximization of the posterior probability amounts to the minimization of 
the energy. We do that by means of the gradient descent method which leads 
eventually to non-linear diffusion type equations. 
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4 Derivatives Estimation: Geometric Method 



In this section we incorporate the Beltrami framework into the Byesian paradigm. 
We consider the intensity to be part of the feature space, and the fifth-dimensional 
embedding map is 

= I{x,v),X^ = Ei(x,y),X5 = V2{x,y)). (6) 

Again we assume that these are Cartesian coordinates of and therefore hij = 
5ij . That implies the following induced metric: 

, , l+ll + Vll + V2l IJy + Vl^Vly + V2^V2y\ 

yij^^Yi^Yi^^Y2^V2y 1 + /2 + + ^ 2 ^ J' 

The energy functionals have two more terms: The first is a fidelity term of the 
denoised image with respect to the observed one, and the last is an adaptive 
smoothing term. The functionals are 

Eo{I{x,y),I°{x,y)) = ^Co J dxdy^{I - 

Ai(/0(a:,y),V(x,y)) = iCi J dxdy^ {\V - 

E2{V{x,y)) = ^C2 J dxdy^gi^‘'{d^X^){d,X^) 

Es{V{x,y)) = J dxdy^{e^’'d,V^)\ (8) 

and since the Levi-Civita connection’s coefficients are zero, we get the following 
gradient descent system of equations: 



h = C^Agl - Coil - 1°) 

Vp^ = C^AgVp - CliVp - dpl°) 



V9 



dpi^e^'^d.Vp), 



with the initial conditions 



(9) 



I{x,y,t = 0) = I°{x,y) 

Vp{x,y,t = 0) = dpl°{x,y), (10) 

where C{x,y) is the given image. 

It is important to understand that V\ and V 2 are estimates of Iqx and loy 
and not of the denoised Ix and ly. 

5 Results and discusssion 

The solution of the PDE’s was obtained by using the explicit Euler scheme, 
where the time derivative is forward and the spatial derivatives are central. The 
stencil was taken as 3 x 3. 





A Geometric Functional for Derivatives Approximation 



Fig. 1. Upper row, left: The original noisy x derivative. Upper row, right: The x 
derivative estimation. Middle row, left: The original noisy y derivative. Middle row, 
right: The y derivative estimation. Lower row, left: The original noisy image. Lower 
row, right: The denoised image. 
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We did not optimize any parameter, nor the size of the time steps. For the 
Euclidean embedding algorithm we chose C\ = 0.5, C2 = 1, C3 = 8.5 and the 
time step was At = 0.005. The results after 150 iterations are depicted in Fig. 

(I). 

This demonstrates that it is possible to merge Bayesian reasoning and the 
geometric Beltrami framework in computation of derivative estimations. The 
requirement that the obtained functions are the x and y derivatives of some un- 
derlying function is formulated through a Lagrange multiplier. Close inspection 
reveals that this requirement is fulfilled only approximately. 

An analysis and comparison with statistical based method will appear else- 
where [9]. 
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Abstract. Automatic segmentation is performed using watersheds of 
the gradient magnitude and compression techniques. Linear Scale-Space 
is used to discover the neighbourhood structure and catchment basins 
are locally merged with Minimum Description Length. The algorithm 
can form a basis for a large range of automatic segmentation algorithms 
based on watersheds, scale-spaces, and compression. 



1 Introduction 

A semantically meaningful segmentation of an indoor scene would be piecewise 
smooth regions corresponding to walls, floor, etc.. Such segmentation tasks are 
often solved indirectly using some similarity measure, and this article will fo- 
cus on the gradient magnitude, since discontinuities are most likely where the 
gradient magnitude is high. 

Generally, segmentation is an NP-complete problem [2] for two dimensional 
images, however reasonable solutions may be found in polynomial time. Segmen- 
tation algorithms may be divided into three broad categories: Intensity thresh- 
olding [9], regional split and merge [9], variational and partial differential equa- 
tion (PDF) based approaches [6,14], and mixes of the previous three [8,4,5]. 

The algorithm presented in this article uses a PDF based technique [8] for hi- 
erarchical splitting regions based on a well-founded, thoroughly studied, and least 
committed scale analysis [3,8]. The regions are merged with consistent modelling 
by Minimum Description Length (MDL) [10] to yield parametric descriptions of 
segments. 

2 Least Committed Splitting 

Watersheds on the gradient magnitude partition an image into homogeneous 
areas in a fast manner, and in contrast to the Mumford-Shah functional, the 

* Supported in part by EC Contract No. ERBFMRY-CT96-0049 
(VIRGO http://www.ics.forth.gr/virgo) under the TMR Programme. 
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watersheds are not restricted to intersect in T-junctions at 120 degree angles. To 
regularise the gradient operator several authors have investigated the properties 
of the watersheds of the gradient magnitude in various scale-spaces [3,8,4,14]. In 
the linear scale-space [15], this has lead to the development of a semi-automatic 
segmentation tool [7], henceforth called Olsen’s segmentation tool. 

Olsen’s segmentation tool organises segments in a hierarchical data structure, 
making it convenient to use as a splitting operation. At each scale the image 
is partitioned by the watersheds of the gradient magnitude, and the catchment 
basins are linked across scale exploiting the deep structure [8] . The linking graph 
can be approximated with a tree, called the Scale-Space Tree. The tool can prag- 
matically be extended to other similarity measures disregarding deep structure 
merely with the use of area overlap between scales. 

In Figure I are given examples of partitions for two similar images. The 




Fig. 1. The Scale-Space Tree captures ellipses of varying size. The white lines are the 
watersheds. Two ellipses are shown using three different integration scales. 



original images consists of 3 intensity values (64, 128, 192) plus i.i.d. normal noise 
with zero mean and standard deviation 5. The ellipses are one pixel further into 
the light than the dark area. The segments at measurement scale zero are shown 
for three different integration scales. We observe that the ellipses are captured at 
low and high integration scale respectively, indicating that structure of varying 
size is captured by the Scale-Space Tree at corresponding levels. Hence, the task 
of the merge algorithm is to perform local scale-selection. 



3 Specifying Semantics by Compression 

For models where increasing the number of degrees of freedom monotonically 
decreases the distance to the data, we need some criterion to balance model 
complexity and model deviation. There are at present three competing model 
selection methods: Akaike’s Information Criterion (AIC), Schwarz’s Bayes Infor- 
mation Criterion (BIC), and Rissanen’s Minimum Description Length (MDL) [I]. 
The original formulation by Akaike AIC is known to be inconsistent in the sense 
that it will not always converge to the correct model with increasing samples. In 
contrast, both BIC and MDL have been shown to be consistent and converge to 
each other, but MDL is the only method that is derived from a principle outside 
the problem of model selection. Thus in contrast to AIC and BIC, MDL gives 
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a clear interpretation of the resulting model selection, as that which achieves 
optimal compression. Therefore will MDL be used to merge segments created by 
Olsen’s segmentation tool. 

For the model selection criterion to be consistent, every investigated model, 
must include everything that is needed to completely reproduce the data set. 
Mixing models from deterministic and stochastic domains is quite natural, since 
every physical signal contains a portion of randomness. A typical MDL functional 
is the sum of the number of bits used to describe the model L{9) and the 
deviation from the model L{x\9), where x and 9 denotes vectors of data and 
model parameters [10]. Model selection performed by minimizing this sum, 

9 = aigTCLin L{x\9) + L{9) (1) 

0 

For compression it is quite natural to study a quantisation of the parameter 
space with respect to the total code length. In broad terms, the needed precision 
is in practice inversely proportional to the second order structure of the sum 
in (1), which in turn is inversely proportional to the variance of the estimator. 
For almost all estimators this variance is inversely proportional to the number 
of data points. Except for the square root we are intuitively led to the classical 
result of [10]: 

\Q\ 

lim L{x\9) + L{9)=L{x\9)+L{9)+^-^\ogn + 0{\9\) (2) 

|tc|— >-oo 2 

where 9 denotes the truncated parameters, 9 are the maximum likelihood esti- 
mates, and |0| is the number of parameters. This limit has recently been sharp- 
ened to be an o(l) estimate [1]. However, since the per data point improvement 
is ignorable when |a;| |0|, (2) suffices for large segments. 

A coding scheme for segmentation naturally divides into a code for the border 
and the interior [5] . For many large segments there will be a natural tendency for 
code length of the border to be diminished by the code length of the interior. It 
is noted that there is a large group of shapes, where this is not the case, however 
we do not expect these shapes to be typical. A simple chain code for the border 
will therefore suffice. A better and model driven code for borders may be found 
in [12]. For the interior, the choice of model class is much more interesting. In 
the piecewise smooth case, low order polynomials are obviously suitable and can 
be interpreted as the extension of the local structure. Harmonic representations 
are definitely also possible, and cosine waves may be versatile enough to handle 
both smooth regions and texture like regions. For simplicity however, we will use 
the class of lower order polynomials plus i.i.d. normal noise with zero mean. We 
will use the centroid of a segment as origin, and the parameters will be coded as 
the universal prior of integers [10]. 

The least squares fitting procedure is well suited for the normal distributed 
noise. However, it is ill suited in the case of outliers since just a single devi- 
ating point can make the fit arbitrarily bad. Such outliers do occur for simple 
image structure such as corners and T-junctions. In the spirit of Least Median 
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of Squares [11], we have implemented a method that uses 1% of a segment’s 
pixels as test-inliers (an elemental subset), refits the model on this subset, and 
calculates the median of squared deviation of the inliers as a quality measure. 
This process is iterated till sufficient confidence, and the parameters for the 
subset that minimises the quality measure are used to dissect the segment into 
inliers and outliers. In contrast to statistical models of outliers [11], we order 
the outliers by their distance to the robust fit, and by coding the outliers by the 
universal distribution of integers for outliers we may iteratively find the opti- 
mal division between inliers and outliers. This has proven to be a very effective 
outlier detector. 

We finally derive the total MDL functional for a segment as, 
ife = -^ 27re -h log ^ {xi - f{xi, 

+ ^ log*(0j) + ^ log [a^l -I- |i9a:| -|- log* (outlier) (3) 

j 

where the maximum likelihood estimate, {xi — f{xi, 0))^ /\x\, has been 

used, / is a function from the class, log* is minus the logarithm to the universal 
distribution [10], and a 4-connected chain code of the boundary, dx, has been 
used. We have divided the code length estimate for the chain code by two, since 
almost all border points are used for exactly two segments. To code the outliers 
the coordinate and value as integers must be supplied. The total code length 
for the image is given by independence as, L = The task of the merge 

algorithm is thus to find a minimum for L over the number and placement of 
segments. This is in general an intractable problem [2]. In the following section 
will a reasonable, fast, but suboptimal algorithm be given. 



3.1 A General Merge Algorithm 

The goal of our merge algorithm is only to consider local neighbourhoods in a fine 
to coarse manner. A single iteration of the algorithm is illustrated in Figure 2. 
Leaves A, B, C, and D are all segments tracked to measurement scale. At bottom 
level, we find the best local merge. In this case, segment A has no siblings to 




Fig. 2. A single step of the merge algorithm. LEFT: the original tree, MIDDLE: Subtree 
B,C, and D is merged into B and C-l-D. RIGHT: Children replace parents. 
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merge with, while all possible merges of B, C, and D are examined. For the 
example we assume that merging C and D is the optimal local solution. When 
all sibling tuples have been optimally locally merged, the remaining siblings take 
the place of their parent, and the algorithm is reiterated on the smaller tree. 

Since there is no direct cross-talk between neighbouring sibling tuples, the 
tree defines a hierarchical neighbourhood structure, and the final segmentation 
result cannot be better than defined by the neighbourhood structure of the tree. 
As all merge algorithms, this algorithm does not guarantee global optimum, but 
the advantage of this algorithm is that the search space is restricted by the 
geometrical structure of the image defined by the Scale-Space Tree. 

4 Shapes in Data 

Interpreting data has two basic steps: Firstly, a proper syntax must be found, 
which can contain all data sets to be considered. Secondly a sentence must be 
composed that describes a particular data set. This article has described an al- 
gorithm that uses the Scale-Space Tree to define the neighbourhood structure of 
regions and seeks the particular combination of neighbourhoods that reduces the 
description length according to a prespecified preference. In Figure 3 are shown 
several examples of segmentations produced by the algorithm. On the simple im- 




Fig. 3. Segmentation of simple structures. Left images are as Figure 1, and right images 
show light ellipses on a lighter background (values 112, 128 and standard deviation 5). 



ages we observe that the algorithm correctly joins segments from various levels 
of the Scale-Space Tree for a remarkable range of sizes and intensity differences. 
On more complex images such as shown in Figure 4 the algorithm displays a 
range of behaviours. It is difficult if not impossible to obtain the ‘correct’ seg- 
mentation of such images, but we conclude that the algorithm does distinguish 
a number of significant regions, and that the concept of lossless coding allows 
for a consistent discussion of different segmentation algorithms. 
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Fig. 4. Segmentation of an indoor scene and an MR slice. 
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Abstract. In this paper binocular stereo in a linear scale-space setting 
is studied. A theoretical extension of previous work involving the optic 
flow constraint equation is obtained, which is embedded in a robust top- 
down algorithm. The method is illustrated by some examples. 



1 Introduction 

Stereo and optic flow are closely related. One method to study motion and short 
baseline stereo is by the optic flow equation [3]. Recently, it has been realized 
that, because we are dealing with observed data, the equation consequently has 
to be embedded in scale-space (’’brought under the aperture”) [1]. This has been 
successfully applied to optic flow and binocular stereo extraction, which is not 
necessary short baseline stereo anymore because of the scale involved [8,7]. 

This paper extends the theory of [1] (and [7] for binocular stereo) by taking 
time discrete, i.e., no Altering is performed in that direction. This is of course 
the case in binocular stereo, where only two frames are present. We show that 
in that case higher order polynomials in the disparity are obtained. Then the 
disparity has to be expanded in a spatial series to obtain a solution, like in [1]. 

We incorporate this method in a top-down patch-based stereo algorithm. 
During the descending over scale, consistency is enforced based on the scale 
and the residual (i.e. the value of the function minimized using a least squares 
process), like in [5]. 

2 Review of optic flow under the aperture 

In this section a brief review of the optic flow equation under the aperture [1,7,8] 
is given. The classical optic flow constraint equation for a spatio-temporal image 
/ in 2D-|-time is given by: 



/* + u"4 + u% = 0 (1) 

where the subscript denotes partial differentiation and the superscript denotes 
the component of the flow. In discrete images we can not take derivatives directly, 
so we have to obtain an equation with regularized derivatives, i.e. Gaussian 
derivatives at a certain scale. 

* This work was supported by the NWO-Council Earth and Life Sciences (ALW), 
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In order now to get an optic flow equation involving regularized derivatives 
we (formally) convolve both sides of eq. (1) with 7 , the Gaussian kernel at a 
certain scale (see also [1,7,8]): 

{It + + vyiy) *7 = 0 (2) 

(Since the values and depend on the position, the left-hand side of the 
equation is generally not equal to Lt + u^ L^+u^ Ly.) To move the differentiation 
operator from the image to the aperture we have to use partial integration. For 
this reason we have to model the unknown velocity (or disparity) held and 
with a polynomial function to some order. For we get a local, spatial series: 

M m 

u""{x,y)= ^ ^C/^(„_„)(a;o,i/o)(a;-a:o)"(j/-2/o)’””" (3) 

m—0 n—0 

A similar equation is derived for . Using the fact that derivatives of 7 are Her- 
mite polynomials times the kernel 7 itself, partial integration gives an equation 
in (M-|-2)*(M-|-1) unknowns (the U^j-s and Ufj-s), with the derivatives moved 
from the image to the aperture. We clarify this by an example: take M = 1 in 
the series expansions, then eq. ( 1 ) is replaced by: 

Lt Lx -\- (t'^Ux Lxx + ^"^Uy Lxy -\- Ly cr^UxLxy -\- u^UyLyy = 0 (4) 

where cr is the scale of the Gaussian operator 7 . We can not solve for six un- 
knowns from one equation. Therefore we have to use additional equations. In [1] 
extra equations are obtained by taking partial derivatives of the equation up till 
the order of the truncation. Due to the aperture problem additional knowledge 
has to be included, e.g., that the flow is only horizontally. One gets a linear sys- 
tem, which can easily be solved. In binocular stereo the temporal derivative is 
replaced by the difference between the right and the left images. The additional 
scale parameter can be used to select a proper scale for the computations [7,8,9]. 



3 Higher order Taylor expansion 

The mathematical idea behind the approach discussed in the previous section is 
to compute the infinitesimal velocity. However, although a theoretically sound 
continuous interpretation can be given for the discretely sampled space-time, we 
actually want the displacement between two adjacent slices in temporal direction 
and not the infinitesimal displacement. In binocular stereo, the main focus of 
our approach, it is not at all possible to perform a temporal smoothing. 

Let I{x,y,t) be a spatio-temporal image. To obtain the displacements be- 
tween two frames, one taken at t = to and one taken at t = ti, we have to 
solve: 



rt a^(x,y) 
I{x , 2 / 



al{x,y) ^ ^ xf , °'c(x,y) , ay{x,y) ^ ^ 

,to) = I{x+ ,y+ ,ti) (5) 



and are the presumed model for the flow (disparity). We use = a^{x, y) 
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and = a^{x,y) for clarity in the formulas below. Both I{x — ° 1 V ~ 

a'‘(x,v) ^ I(x + ° ’ ^1) expanded in a Taylor Series. 

Then we truncate the series at some order K to obtain the displacement. 



0 = I(a;+ y,j/- 



u- \ r/ ^ \ 

y,ti) -I[x- y,2/- y,^o) 



EE 






2^l\{k — /)! \dx^dy^~^ 



3r/(a;,y,ti) - (-1)* 



3 k 



dx^dy 



^—J{x,y,to)] (6) 



for certain K. So far we formally differentiated I{x,y,t), but now we put the 
equation under the aperture. Following the lines of section 2 we obtain: 



We have to move the derivative operator from the images to the aperture. Before 
we are able to do so we have to expand and in a truncated series. Note 
that it is not a truncated Taylor Series, and actually depending on a, since eq. 
(7) has to be satisfied for the approximation. 



M 



\x,y) = E E' 



m=0 n=0 



nl{m-n)l 



n^,m—n 



X y 



(8) 



and similar for a^. Using this expansion the derivatives can be moved from the 
images to the aperture in eq. (7). This yields one equation involving regularized 
derivatives. 

We could obtain more equations by using in addition to 7 in eq. (7) certain 
derivatives of 7, just as is done in the method discussed in the previous section. 
Adding physical constraints, necessary due to the aperture problem, leads to a 
finite number of possible solutions, from which the proper one should be chosen. 

In this paper we use a different approach to obtain a solution from eq. (7), 
in combination with eq. (8) (and the similar one for a^). The motion in a point 
is given by the least squares solution to the equations in a neighborhood of that 
point. We explain the developed algorithm in the next section. We finish with 
an example of the above derived equation: 



Example 1. K = 2, M = 1, horizontal flow: 



- T°) + a 



ri I rO ri _L rO r 1 _l_ r 0 

X .X 2^xx -^XX I X ^2 xy~ ^xy 

00 2 .. 



+ «oW 






rl tO 

/ X \2^xx ^xx I / X \2_2^ ^xxxx -^xx ^ ^xxxx ^xx 

(aoo) 5 ^(aio) ^ 



rr‘^T^ — rr‘^T^ — I ^ 

^x \ 2_2 ^xxyy ~ -^xx ^ ^xxyy ^xx x ^x 2^xxx 

2 oi) cr ^ hoooaiocr 






i \2 2 

^ o ' “UU“iU“ ^ 

„x „x J2.^xxy~^xxy x „x _4^xxxy ~ ^xxxy ^ 

^aioOoicr = 0 (9) 
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4 Over-constrained systems 

Using the derivatives from one point, although computed using a filter of finite 
width, can give rise to noisy results. Another severe problem which occurs in 
practice is the difference in illumination between the left and right images. 

Possible solutions for these problems are preprocessing the images with a 
small Laplacian filter, or to use higher order derivatives of 7 instead of 7 as 
initial aperture filter. These approaches will be studied in further work on this 
subject, but in this paper we propose a least squares solution to eq. (7) over a 
patch, where we add a new variable to account for the local greyvalue difference 
between the images. It is a generalization of [6] (which is our case with K = 1 
and M = 0). 

For this generalization we write eq. (7) with the additional greyvalue offset 
g and with the expansion for and (eq. (8)) as follows: 

f{aM,x,y,to,ti,K) + g = 0 (10) 

where ajvi is the vector containing all coefficients of eq. (8) and the expansion of 
a^, and g is the greyvalue offset, that needs to be computed. Of course the least 
squares solution is given by 

= argmin V {f{aiA,x,y,to,h,K) + g)‘^ (11) 

(aM.g)GR(M+i)(M+2) + l 

The neighborhood Q has to contain at least (M + 1)(M + 2) + 2 points to make 
the system over-constrained. 

In the case of M = 0 and only horizontal displacement (oqq = 0) we find the 
optimal parameter as follows: 

In the minimum of /(ago, x, y, to, ti,K, g) := E(x.y)Gr2(/(^M, x, y, to, ti,K)+g)'^ 
both T^|- = 0 and = 0. From ^ = 0 it follows that in a singular point g can 

be written as a polynomial in agp. Inserting this polynomial in = 0 yields 

that we have to solve a polynomial equation of degree {2K — 1) in ago to find 
the global minimum. In case we have to deal with both horizontal and vertical 
displacement and K > 1, or if we have only horizontal displacements and M > 0 
and iF > 1, we have to use a minimization algorithm [10], which might yield 
only a local minimum. 

5 Top-down algorithm 

The above described approach is implemented in a top-down algorithm, where 
the solution on every scale level has to fulfill certain conditions, similar to some 
ones used in [5], to be labeled as reliable. 

The filter size used and the size of the correlation region, which we take 
equal to the filter size (similar to [4, Ch. 14]), restrict the size of solution. If the 
solution is much larger than the filter size it becomes more unreliable. Therefore 
we should only allow solutions which are in order of magnitude equal to ct. So we 
compute the disparity for a stack of scales and allow for every scale only those 
u and V, which are both smaller than a. 
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We put in another restriction, that is the size of the found minimum of the 
function that has been minimized. If the value of that so-called residual ([5]) 
is larger than a certain threshold, the found displacement is regarded as wrong 
(e.g., a local minimum instead of a global minimum, or an occluded point), and 
therefore removed. 

Using the points which fulfill both restrictions, a displacement field for the 
whole image is obtained by interpolation. Going down in scale now we first 
compensate for the obtained flow field and repeat the procedure. For M = 1, 
\ux\ < 2 and \uy\ < 1 are taken (note that the restriction on is imposed by 
the ordering constraint). 



6 Results 

We show some results of the methods on a synthetic random dot pair, of which 
the 3-D scene is a paraboloid on a flat background, and Bill Hoff’s fruit pair [2]. 
On every level in the algorithm we first checked the size of results, after which 
10 % of the remaining points were removed using the size of the residual. 

In Fig 1 the input random dot pair is shown, together with the results for 
K = 1 and M = 0, K = 3 and M = 0, and K = 2 and M = 1. The results 
for all methods are quite similar. The method with K = 1 and M = 0 retains a 
little more points than the method with K = 2 and M = 1, but the standard 
deviation of the error (compared with the ground truth) is a little smaller for 
the method with K = 2 and M = 1. 

On the fruit image we see some more differences. In the first row the results of 
the same expansions as used in the random dot example are shown. The method 
with K = 2 and M = 1, throws away more points than the other methods, 
but also yields less points with the wrong disparity. In order to compare the 
methods better we have varied the number of points that was thrown away 
based on the residual, such that in every method 60 % of all points was retained 
by the algorithm. The result using K = 2 and M = 1 still contains less outliers. 



7 Discussion 

Binocular stereo in scale-space has been studied. A theoretical, extension to the 
optic flow work in [1], especially suited for stereo, has been described. This theory 
has been embedded in a robust top-down algorithm. Some examples have been 
given to illustrate the derived results, but more study has to be done. For in- 
stance: what are the results if we replace the patch approach with a preprocessing 
step on the images to overcome illumination differences? For the same purpose, 
could we use different Alters instead of 7 , especially higher order derivatives of 7 ? 
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Fig. 1. From left to right: the left input image, the right input image, results for K — 1 
and M — Q, K — ?> and M = 0, and K = 2 and M = 1. 




Fig. 2. In the left column the input images are displayed. In the remaining part of 
the top row the results for K = 1 and M = 0, K — 3 and M — 0, and K = 2 and 
M = I are displayed. In the second row the same expansions are used, but now 60% 
of all points are retained. 
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Abstract. In this paper, classical nonlinear diffusion methods of ma- 
chine vision are revisited in the light of recent results in nonlinear sta- 
bility analysis. Global exponential convergence rates are quantified, and 
suggest specific choices of nonlinearities and image coupling terms. In 
particular, global stability and exponential convergence can be guaran- 
teed for nonlinear filtering of time-varying images. 



1 Introduction 

Nonlinear reaction-diffusion processes are pervasive in physics [7]. In [11,12], we 
extended recent results on stability theory, referred to as contraction analysis 
[10], to partial differential equations describing time- varying nonlinear reaction- 
diffusion processes, and showed that analyzing global stability and determining 
convergence rates is very simple indeed for such processes. In this paper, classical 
nonlinear diffusion methods of machine vision [1,2,3,5,6,8,9,13,16,17,18,19,20] are 
revisited in the light of these recent results. 

Section 2 summarizes the contraction properties of nonlinear reaction-diffu- 
sion-convection equations of the form 

^ = div h{V(j), t) +v'^{t)V(j) + X, t) (1) 

and explicitly quantifies stability and convergence rates. In section 3, rhese re- 
sults are then applied to classical nonlinear diffusion methods of machine vi- 
sion, and suggest specific choices of nonlinearities and image coupling terms. In 
particular, global stability and exponential convergence can be guaranteed for 
nonlinear filtering of time- varying (video) images. Brief concluding remarks are 
offered in section 4. 

2 Contraction Analysis of Nonlinear Diffusion Processes 

Differential approximation is the basis of all linearized stability analysis. What 
is new in contraction analysis is that differential stability analysis can be made 
exact, and in turn yield global exponential stability results [10]. 
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The theory can be applied simply to important classes of physical ditributed 
processes. In particular, consider the system (1) on an bounded m-dimensional 
continuum V, let Ik, max be the diameter (maximum length) of V along the A:**' 
axis, and A ah be a lower bound on the smallest eigenvalue of the symmetric 

dV4> 

part of > 0 on y . It can then be shown [11,12] that 

Theorem 1. Consider the nonlinear reaction-diffusion-convection equation (1), 
where 

9h/9V {4>,t) > 0, and assume that 

^diff + max 

is uniformly negative, where 




^diff 



—A ah 

9'V4> 



^ 27t2 

72 

f ^—1 ^k,max 



(2) 



for a (perhaps time-varying) Dirichlet condition (i.e., (j>{f) specified on the bound- 
ary), and 

^diff = 0 (3) 

for a (perhaps time-varying) Neumann condition. Then, all system trajectories 
converge exponentially to a single field with minimal convergence rate 

\^diff + max{^)\. 

In the autonomous case (/ = /((/), x),v constant, and with constant boundary 
conditions) the system converges exponentially to a steady-state <Td(x), which is 
the unique solution of the generalized Poisson equation 

0 = div h{V(f)d) + V</>d + fif’d, x) 



The method of proof implies that all the results on contracting systems in [10] 
can be extended to contracting reaction-diffusion processes, with boundary con- 
ditions acting as additional inputs to the system. For instance, any autonomous 
contracting reaction-diffusion process, when subjected to boundary conditions 
periodic in time, will tend exponentially to a periodic solution of the same period. 
Also, any autonomous contracting reaction-diffusion process will tend exponen- 
tially to a unique steady-state. The convergence is robust to bounded or linearly 
increasing disturbances. The stability guarantees also hold for any orthonormal 
Cartesian discretization of the continuum. Finally, chains or hierarchies of con- 
tracting processes are themselves contracting, and thus converge exponentially, 
allowing multiple levels of stable preprocessing if desired. 



3 Machine Vision 

The above results can be used in particular in problems of function approxi- 
mation and nonlinear filtering. One such application is machine vision, where 
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they suggest a different perspective and systematic extensions (notably to time- 
varying images) for the now classical results of scale-space analysis and anisotropic 
diffusion (see [1,2,3,5,6,9,13,16,17,18,19,20]). Similar questions occur in models 
of physiological vision and eye movement [4] . 

Consider the problem of computing a smooth estimate of a noisy time- 

varying image (j)(x,t), while preserving meaningful discontinuities such as edges 
at given scales [20,14]. Define the filter 

^ =div h(V<^,t)-bv'^V(^-b/(^-(()) (4) 

(j) = ^ + 4> 



where v(t) accounts for camera motion. The system thus verifies the nonlinear 
reaction-diffusion-convection equation 

^ - |^(f) = div h(V(^, t) + -b f{^ - 

Thus the dynamics of (j) contains although the actual computation is done 
using equation (4) and hence ^ is not explicitly used. 

According to Theorem 1, global exponential convergence to a unique time- 
varying image can be guaranteed by choosing the nonlinear function / to be 
strictly decreasing, and the field h to have a positive semi-definite Jacobian. 
These can be used to shape the performance of the filter design. For instance, 

— Choosing h in the usual form 

h = g(||V0||) 

and letting r = ||V(()||, the corresponding Jacobian is symmetric and can be 
written 

^ dr [jV^II ||V<^|| 

Since the largest eigenvalue of the last dyadic product is 1, the system is 
globally contracting for g > Q and > q. One might e.g. choose 

g = tanh(ar)/(ar) or g = sin (^ sat(agrad 4>))/{ar) (with a a constant) 
to filter small r and leave large r unfiltered. More generally one may choose 
g = 0 for specific r ranges, leaving the corresponding part of the image 
unfiltered. 

— Outliers (p can be cut off e.g. with a sigmoidal /. 

Furthermore, chains or hierarchies of contracting processes are themselves 
contracting, as mentioned earlier. This implies, for instance, that the velocity 
V above, the input coupling /, or the time-dependence of h (e.g. the choice of 
threshold) could themselves result from “higher” contracting processes. Simi- 
larly, prefiltering in time or space of the signal can be straightforwardly incor- 
porated. 
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Finally, note that in the case that v is actually unknown, but constant or 
slowly varying, it is straightforward to design versions of the above which are 
adaptive in v. 

4 Concluding Remarks 

This paper exploits the contraction properties of nonlinear reaction-diffusion- 
convection equations to suggest a different perspective on classical results on 
nonlinear diffusion for machine vision. In particular, it explicitly quantifies global 
exponential convergence rates, and extends systematically to stable nonlinear 
diffusion of time-varying images. Relationships between noise models and spe- 
cific choices of / and h or of higher contracting processes need to be studied 
further. Additional flexibility may be also be obtained by shaping the system 
metric while still preserving global exponential stability [10], and could parallel 
recent developments [8]. Numerical implementation on actual video images will 
be presented at the conference. 

Acknowledgements: We would like to thank Martin Grepl for the numerical 
simulations. 
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Fig. 1. Applying the basic algorithm to the “pulsating square” illusion. Original and 
hltered images at time t = 1, 2, 3, using a sampling rate of 1/20. Note that in such an 
observer design, each new image is processed in only one iteration step. 
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